Computer lab 3

Machine Learning: Mathematical Theory and Applications

Minh Thang Trinh (25585391)

Published

October 14, 2024

Problem 1. Deep learning for spam email data (classification)

rm(list=ls()) # Remove variables 
cat("\014") # Clean workspace
# Load file and design data
load(file = '/Users/thangtm589/Desktop/UTS/37401 Machine Learning/Computer Lab/Lab 3/spam_ham_emails.RData')
set.seed(12345)
suppressMessages(library(caret))
Spam_ham_emails[, -1] <- scale(Spam_ham_emails[, -1])
Spam_ham_emails[, 'spam'] <- as.integer(Spam_ham_emails[, 'spam'] == 1) 

# Construct dataset
train_obs <- createDataPartition(y = Spam_ham_emails$spam, p = .75, list = FALSE)
train <- as.matrix(Spam_ham_emails[train_obs, ])
y_train <- train[, 1]
X_train <- train[, -1]
test <- as.matrix(Spam_ham_emails[-train_obs, ])
y_test <- test[, 1]
X_test <- test[, -1]
suppressMessages(library(tensorflow))
suppressMessages(library(keras3))
tensorflow::tf$random$set_seed(12345)
model <- keras_model_sequential() 
model %>% 
  # Add first hidden layer
  layer_dense(units = 12, activation = 'relu', input_shape = c(15)) %>% 
  # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.3) %>% 
  # Add second hidden layer
  layer_dense(units = 6, activation = 'relu') %>%
  # Add regularisation via dropout to the second hidden layer
  layer_dropout(rate = 0.3) %>%
  # Add layer that connects to the observations
  layer_dense(units = 1, activation = 'sigmoid')
summary(model)
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                     │ (None, 12)               │           192 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout (Dropout)                 │ (None, 12)               │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_1 (Dense)                   │ (None, 6)                │            78 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_1 (Dropout)               │ (None, 6)                │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_2 (Dense)                   │ (None, 1)                │             7 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 277 (1.08 KB)
 Trainable params: 277 (1.08 KB)
 Non-trainable params: 0 (0.00 B)

💪 Problem 1.1

What are the dimensions of \(\boldsymbol{q}^{(1)},\boldsymbol{q}^{(2)}, \boldsymbol{b}^{(1)},\boldsymbol{b}^{(2)}, b^{(3)}, \boldsymbol{W}^{(1)}, \boldsymbol{W}^{(2)}, \boldsymbol{W}^{(3)}\), and \(\Pr(y =1|\mathbf{x})\)?

Basing on the above model configuration, we set up layer_dense(units = 12) for the first hidden layer, layer_dense(units = 6) for the second hidden layer and layer_dense(units = 1) for output

Therefore, we get:

The dimension of \(\boldsymbol{q}^{(1)}\) is \(12\times1\), the output of the first hidden layer has 12 neurons.

The dimension of \(\boldsymbol{b}^{(1)}\) is \(12\times1\), the number of biases corresponds to 12 neurons.

The dimension of \(\boldsymbol{W}^{(1)}\) is is \(12\times15\), the shape of input data has 15 features, which times 12 neurons in the first hidden layer.

The dimension of \(\boldsymbol{q}^{(2)}\) is \(6\times1\), the output of the first hidden layer has 6 neurons.

The dimension of \(\boldsymbol{b}^{(2)}\) is \(6\times1\), the number of biases corresponds to 6 neurons.

The dimension of \(\boldsymbol{W}^{(2)}\) is \(6\times12\), the shape of data is 12 as a result of the first hidden layer, which times 6 neurons in the first hidden layer.

The dimension of \(\Pr(y =1|\mathbf{x})\) is \(1\times1\), the output has 1 unit.

The dimension of \(\boldsymbol{b}^{(3)}\) is \(1\times1\), the output has 1 bias.

The dimension of \(\boldsymbol{W}^{(3)}\) is \(1\times6\), the shape of data is 6 as a result of the second hidden layer.

💪 Problem 1.2

What is the number of parameters for each of the three equations above (\(\mathbf{x}\) and \(\boldsymbol{q}\) are not parameters)? Verify that this agrees with the output of summary(model) above.

With default setting use_bias=TRUE, we have:

The number of parameters for equation \(\boldsymbol{q}^{(1)} =h\left(\boldsymbol{W}^{(1)}\mathbf{x}+\boldsymbol{b}^{(1)}\right)\) is 192 including 180 for \(\boldsymbol{W}^{(1)}\) and 12 for \(\boldsymbol{b}^{(1)}\).

The number of parameters for equation \(\boldsymbol{q}^{(2)} =h\left(\boldsymbol{W}^{(2)}\boldsymbol{q}^{(1)}+\boldsymbol{b}^{(2)}\right)\) is 78 including 72 for \(\boldsymbol{W}^{(2)}\) and 6 for \(\boldsymbol{b}^{(2)}\).

The number of parameters for equation \(\Pr(y =1|\mathbf{x}) =g\left(\boldsymbol{W}^{(3)}\boldsymbol{q}^{(2)}+\boldsymbol{b}^{(3)}\right)\) is 7 including 6 for \(\boldsymbol{W}^{(3)}\) and 1 for \(\boldsymbol{b}^{(3)}\).

This is aligned with column Param# in the summary of above model, where the first hidden layer dense_3 showed 192, the second hidden layer dense_4 showed 78 and the output dense_5 showed 7. The total number of parameters in the whole network is:

\[ 192 + 78 + 7 = 277 \]

💪 Problem 1.3

Fit a one layer dense neural network with 8 hidden units to the spam data using the ADAM optimiser. You can use the same settings as the previous problem, but feel free to experiment. How does this model compare to the two layer dense model above?

First, we continue to construct 2-layer dense model as follows:

# Construct 2-layer dense model 
# Set early stopping 
early_stopping <- callback_early_stopping(monitor="val_loss", patience = 10, restore_best_weights = TRUE)

# Compile model
model %>% compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = c('accuracy', 'AUC'))

# Fit model
model_fit <- model %>% fit(X_train, y_train, epochs = 200, batch_size = 50, validation_split = 0.2, callbacks = list(early_stopping))
Epoch 1/200
56/56 - 2s - 34ms/step - AUC: 0.6004 - accuracy: 0.6207 - loss: 0.7100 - val_AUC: 0.7584 - val_accuracy: 0.6816 - val_loss: 0.6326
Epoch 2/200
56/56 - 0s - 2ms/step - AUC: 0.7426 - accuracy: 0.6685 - loss: 0.6171 - val_AUC: 0.8469 - val_accuracy: 0.7279 - val_loss: 0.5613
Epoch 3/200
56/56 - 0s - 2ms/step - AUC: 0.7914 - accuracy: 0.6880 - loss: 0.5745 - val_AUC: 0.8868 - val_accuracy: 0.7612 - val_loss: 0.5060
Epoch 4/200
56/56 - 0s - 2ms/step - AUC: 0.8310 - accuracy: 0.7312 - loss: 0.5304 - val_AUC: 0.9126 - val_accuracy: 0.8119 - val_loss: 0.4579
Epoch 5/200
56/56 - 0s - 2ms/step - AUC: 0.8749 - accuracy: 0.7899 - loss: 0.4718 - val_AUC: 0.9254 - val_accuracy: 0.8379 - val_loss: 0.4175
Epoch 6/200
56/56 - 0s - 2ms/step - AUC: 0.8835 - accuracy: 0.8072 - loss: 0.4458 - val_AUC: 0.9317 - val_accuracy: 0.8582 - val_loss: 0.3868
Epoch 7/200
56/56 - 0s - 2ms/step - AUC: 0.8914 - accuracy: 0.8163 - loss: 0.4197 - val_AUC: 0.9351 - val_accuracy: 0.8683 - val_loss: 0.3627
Epoch 8/200
56/56 - 0s - 2ms/step - AUC: 0.9149 - accuracy: 0.8514 - loss: 0.3893 - val_AUC: 0.9376 - val_accuracy: 0.8799 - val_loss: 0.3421
Epoch 9/200
56/56 - 0s - 2ms/step - AUC: 0.9135 - accuracy: 0.8482 - loss: 0.3755 - val_AUC: 0.9406 - val_accuracy: 0.8886 - val_loss: 0.3288
Epoch 10/200
56/56 - 0s - 2ms/step - AUC: 0.9241 - accuracy: 0.8601 - loss: 0.3578 - val_AUC: 0.9433 - val_accuracy: 0.8871 - val_loss: 0.3180
Epoch 11/200
56/56 - 0s - 2ms/step - AUC: 0.9323 - accuracy: 0.8663 - loss: 0.3328 - val_AUC: 0.9450 - val_accuracy: 0.8900 - val_loss: 0.3103
Epoch 12/200
56/56 - 0s - 2ms/step - AUC: 0.9352 - accuracy: 0.8754 - loss: 0.3309 - val_AUC: 0.9470 - val_accuracy: 0.8973 - val_loss: 0.3043
Epoch 13/200
56/56 - 0s - 2ms/step - AUC: 0.9336 - accuracy: 0.8801 - loss: 0.3365 - val_AUC: 0.9481 - val_accuracy: 0.8958 - val_loss: 0.2992
Epoch 14/200
56/56 - 0s - 2ms/step - AUC: 0.9392 - accuracy: 0.8804 - loss: 0.3138 - val_AUC: 0.9491 - val_accuracy: 0.8987 - val_loss: 0.2955
Epoch 15/200
56/56 - 0s - 2ms/step - AUC: 0.9398 - accuracy: 0.8855 - loss: 0.3183 - val_AUC: 0.9498 - val_accuracy: 0.8958 - val_loss: 0.2935
Epoch 16/200
56/56 - 0s - 2ms/step - AUC: 0.9440 - accuracy: 0.8855 - loss: 0.3020 - val_AUC: 0.9511 - val_accuracy: 0.8987 - val_loss: 0.2903
Epoch 17/200
56/56 - 0s - 2ms/step - AUC: 0.9385 - accuracy: 0.8830 - loss: 0.3183 - val_AUC: 0.9519 - val_accuracy: 0.8987 - val_loss: 0.2870
Epoch 18/200
56/56 - 0s - 2ms/step - AUC: 0.9408 - accuracy: 0.8884 - loss: 0.3100 - val_AUC: 0.9537 - val_accuracy: 0.8987 - val_loss: 0.2848
Epoch 19/200
56/56 - 0s - 2ms/step - AUC: 0.9455 - accuracy: 0.8880 - loss: 0.2966 - val_AUC: 0.9545 - val_accuracy: 0.9001 - val_loss: 0.2831
Epoch 20/200
56/56 - 0s - 2ms/step - AUC: 0.9436 - accuracy: 0.8862 - loss: 0.3063 - val_AUC: 0.9547 - val_accuracy: 0.9016 - val_loss: 0.2822
Epoch 21/200
56/56 - 0s - 2ms/step - AUC: 0.9503 - accuracy: 0.8924 - loss: 0.2925 - val_AUC: 0.9550 - val_accuracy: 0.9001 - val_loss: 0.2785
Epoch 22/200
56/56 - 0s - 2ms/step - AUC: 0.9497 - accuracy: 0.8946 - loss: 0.2835 - val_AUC: 0.9556 - val_accuracy: 0.9016 - val_loss: 0.2765
Epoch 23/200
56/56 - 0s - 2ms/step - AUC: 0.9458 - accuracy: 0.8902 - loss: 0.2996 - val_AUC: 0.9562 - val_accuracy: 0.9059 - val_loss: 0.2742
Epoch 24/200
56/56 - 0s - 2ms/step - AUC: 0.9491 - accuracy: 0.8920 - loss: 0.2859 - val_AUC: 0.9565 - val_accuracy: 0.9045 - val_loss: 0.2711
Epoch 25/200
56/56 - 0s - 2ms/step - AUC: 0.9503 - accuracy: 0.9014 - loss: 0.2848 - val_AUC: 0.9567 - val_accuracy: 0.9059 - val_loss: 0.2689
Epoch 26/200
56/56 - 0s - 3ms/step - AUC: 0.9526 - accuracy: 0.8960 - loss: 0.2801 - val_AUC: 0.9571 - val_accuracy: 0.9045 - val_loss: 0.2673
Epoch 27/200
56/56 - 0s - 2ms/step - AUC: 0.9518 - accuracy: 0.8978 - loss: 0.2848 - val_AUC: 0.9578 - val_accuracy: 0.9059 - val_loss: 0.2651
Epoch 28/200
56/56 - 0s - 2ms/step - AUC: 0.9481 - accuracy: 0.8946 - loss: 0.2897 - val_AUC: 0.9581 - val_accuracy: 0.9059 - val_loss: 0.2642
Epoch 29/200
56/56 - 0s - 2ms/step - AUC: 0.9504 - accuracy: 0.8909 - loss: 0.2889 - val_AUC: 0.9581 - val_accuracy: 0.9088 - val_loss: 0.2646
Epoch 30/200
56/56 - 0s - 2ms/step - AUC: 0.9514 - accuracy: 0.8938 - loss: 0.2831 - val_AUC: 0.9586 - val_accuracy: 0.9059 - val_loss: 0.2642
Epoch 31/200
56/56 - 0s - 2ms/step - AUC: 0.9549 - accuracy: 0.8957 - loss: 0.2750 - val_AUC: 0.9589 - val_accuracy: 0.9117 - val_loss: 0.2631
Epoch 32/200
56/56 - 0s - 2ms/step - AUC: 0.9505 - accuracy: 0.8938 - loss: 0.2839 - val_AUC: 0.9590 - val_accuracy: 0.9146 - val_loss: 0.2622
Epoch 33/200
56/56 - 0s - 2ms/step - AUC: 0.9576 - accuracy: 0.9000 - loss: 0.2648 - val_AUC: 0.9595 - val_accuracy: 0.9146 - val_loss: 0.2610
Epoch 34/200
56/56 - 0s - 2ms/step - AUC: 0.9521 - accuracy: 0.8982 - loss: 0.2804 - val_AUC: 0.9600 - val_accuracy: 0.9146 - val_loss: 0.2595
Epoch 35/200
56/56 - 0s - 2ms/step - AUC: 0.9542 - accuracy: 0.8996 - loss: 0.2763 - val_AUC: 0.9602 - val_accuracy: 0.9146 - val_loss: 0.2591
Epoch 36/200
56/56 - 0s - 2ms/step - AUC: 0.9559 - accuracy: 0.9018 - loss: 0.2743 - val_AUC: 0.9604 - val_accuracy: 0.9161 - val_loss: 0.2592
Epoch 37/200
56/56 - 0s - 2ms/step - AUC: 0.9570 - accuracy: 0.9040 - loss: 0.2708 - val_AUC: 0.9604 - val_accuracy: 0.9161 - val_loss: 0.2583
Epoch 38/200
56/56 - 0s - 2ms/step - AUC: 0.9554 - accuracy: 0.8982 - loss: 0.2743 - val_AUC: 0.9609 - val_accuracy: 0.9146 - val_loss: 0.2569
Epoch 39/200
56/56 - 0s - 2ms/step - AUC: 0.9524 - accuracy: 0.9018 - loss: 0.2813 - val_AUC: 0.9611 - val_accuracy: 0.9146 - val_loss: 0.2567
Epoch 40/200
56/56 - 0s - 2ms/step - AUC: 0.9544 - accuracy: 0.8953 - loss: 0.2724 - val_AUC: 0.9617 - val_accuracy: 0.9161 - val_loss: 0.2564
Epoch 41/200
56/56 - 0s - 2ms/step - AUC: 0.9610 - accuracy: 0.9080 - loss: 0.2575 - val_AUC: 0.9615 - val_accuracy: 0.9161 - val_loss: 0.2559
Epoch 42/200
56/56 - 0s - 2ms/step - AUC: 0.9571 - accuracy: 0.9000 - loss: 0.2657 - val_AUC: 0.9620 - val_accuracy: 0.9161 - val_loss: 0.2562
Epoch 43/200
56/56 - 0s - 2ms/step - AUC: 0.9559 - accuracy: 0.9000 - loss: 0.2742 - val_AUC: 0.9613 - val_accuracy: 0.9161 - val_loss: 0.2567
Epoch 44/200
56/56 - 0s - 2ms/step - AUC: 0.9593 - accuracy: 0.9000 - loss: 0.2630 - val_AUC: 0.9619 - val_accuracy: 0.9161 - val_loss: 0.2562
Epoch 45/200
56/56 - 0s - 2ms/step - AUC: 0.9557 - accuracy: 0.9029 - loss: 0.2732 - val_AUC: 0.9625 - val_accuracy: 0.9161 - val_loss: 0.2558
Epoch 46/200
56/56 - 0s - 2ms/step - AUC: 0.9611 - accuracy: 0.9051 - loss: 0.2520 - val_AUC: 0.9624 - val_accuracy: 0.9161 - val_loss: 0.2560
Epoch 47/200
56/56 - 0s - 2ms/step - AUC: 0.9625 - accuracy: 0.9069 - loss: 0.2452 - val_AUC: 0.9626 - val_accuracy: 0.9161 - val_loss: 0.2551
Epoch 48/200
56/56 - 0s - 2ms/step - AUC: 0.9600 - accuracy: 0.9069 - loss: 0.2550 - val_AUC: 0.9627 - val_accuracy: 0.9190 - val_loss: 0.2554
Epoch 49/200
56/56 - 0s - 2ms/step - AUC: 0.9587 - accuracy: 0.9018 - loss: 0.2591 - val_AUC: 0.9628 - val_accuracy: 0.9175 - val_loss: 0.2548
Epoch 50/200
56/56 - 0s - 2ms/step - AUC: 0.9622 - accuracy: 0.9054 - loss: 0.2447 - val_AUC: 0.9626 - val_accuracy: 0.9175 - val_loss: 0.2558
Epoch 51/200
56/56 - 0s - 2ms/step - AUC: 0.9599 - accuracy: 0.9062 - loss: 0.2549 - val_AUC: 0.9628 - val_accuracy: 0.9175 - val_loss: 0.2543
Epoch 52/200
56/56 - 0s - 2ms/step - AUC: 0.9577 - accuracy: 0.9040 - loss: 0.2660 - val_AUC: 0.9633 - val_accuracy: 0.9175 - val_loss: 0.2525
Epoch 53/200
56/56 - 0s - 2ms/step - AUC: 0.9616 - accuracy: 0.9054 - loss: 0.2531 - val_AUC: 0.9634 - val_accuracy: 0.9175 - val_loss: 0.2519
Epoch 54/200
56/56 - 0s - 2ms/step - AUC: 0.9604 - accuracy: 0.9083 - loss: 0.2595 - val_AUC: 0.9633 - val_accuracy: 0.9161 - val_loss: 0.2519
Epoch 55/200
56/56 - 0s - 2ms/step - AUC: 0.9623 - accuracy: 0.9000 - loss: 0.2493 - val_AUC: 0.9636 - val_accuracy: 0.9146 - val_loss: 0.2523
Epoch 56/200
56/56 - 0s - 2ms/step - AUC: 0.9631 - accuracy: 0.9109 - loss: 0.2443 - val_AUC: 0.9635 - val_accuracy: 0.9161 - val_loss: 0.2531
Epoch 57/200
56/56 - 0s - 2ms/step - AUC: 0.9639 - accuracy: 0.9076 - loss: 0.2448 - val_AUC: 0.9637 - val_accuracy: 0.9146 - val_loss: 0.2520
Epoch 58/200
56/56 - 0s - 2ms/step - AUC: 0.9620 - accuracy: 0.9058 - loss: 0.2471 - val_AUC: 0.9638 - val_accuracy: 0.9146 - val_loss: 0.2505
Epoch 59/200
56/56 - 0s - 2ms/step - AUC: 0.9637 - accuracy: 0.9123 - loss: 0.2426 - val_AUC: 0.9640 - val_accuracy: 0.9146 - val_loss: 0.2499
Epoch 60/200
56/56 - 0s - 2ms/step - AUC: 0.9615 - accuracy: 0.9000 - loss: 0.2559 - val_AUC: 0.9640 - val_accuracy: 0.9146 - val_loss: 0.2497
Epoch 61/200
56/56 - 0s - 2ms/step - AUC: 0.9603 - accuracy: 0.9022 - loss: 0.2533 - val_AUC: 0.9641 - val_accuracy: 0.9132 - val_loss: 0.2498
Epoch 62/200
56/56 - 0s - 2ms/step - AUC: 0.9602 - accuracy: 0.8996 - loss: 0.2596 - val_AUC: 0.9642 - val_accuracy: 0.9146 - val_loss: 0.2487
Epoch 63/200
56/56 - 0s - 2ms/step - AUC: 0.9612 - accuracy: 0.9072 - loss: 0.2547 - val_AUC: 0.9639 - val_accuracy: 0.9132 - val_loss: 0.2483
Epoch 64/200
56/56 - 0s - 2ms/step - AUC: 0.9605 - accuracy: 0.9022 - loss: 0.2489 - val_AUC: 0.9644 - val_accuracy: 0.9132 - val_loss: 0.2489
Epoch 65/200
56/56 - 0s - 2ms/step - AUC: 0.9638 - accuracy: 0.9123 - loss: 0.2535 - val_AUC: 0.9645 - val_accuracy: 0.9132 - val_loss: 0.2479
Epoch 66/200
56/56 - 0s - 2ms/step - AUC: 0.9618 - accuracy: 0.9076 - loss: 0.2487 - val_AUC: 0.9645 - val_accuracy: 0.9132 - val_loss: 0.2477
Epoch 67/200
56/56 - 0s - 2ms/step - AUC: 0.9627 - accuracy: 0.9065 - loss: 0.2505 - val_AUC: 0.9648 - val_accuracy: 0.9132 - val_loss: 0.2465
Epoch 68/200
56/56 - 0s - 2ms/step - AUC: 0.9647 - accuracy: 0.9087 - loss: 0.2386 - val_AUC: 0.9650 - val_accuracy: 0.9132 - val_loss: 0.2460
Epoch 69/200
56/56 - 0s - 2ms/step - AUC: 0.9656 - accuracy: 0.9127 - loss: 0.2398 - val_AUC: 0.9653 - val_accuracy: 0.9146 - val_loss: 0.2447
Epoch 70/200
56/56 - 0s - 2ms/step - AUC: 0.9648 - accuracy: 0.9130 - loss: 0.2422 - val_AUC: 0.9652 - val_accuracy: 0.9132 - val_loss: 0.2446
Epoch 71/200
56/56 - 0s - 2ms/step - AUC: 0.9615 - accuracy: 0.9072 - loss: 0.2539 - val_AUC: 0.9655 - val_accuracy: 0.9146 - val_loss: 0.2437
Epoch 72/200
56/56 - 0s - 2ms/step - AUC: 0.9629 - accuracy: 0.9054 - loss: 0.2463 - val_AUC: 0.9655 - val_accuracy: 0.9146 - val_loss: 0.2428
Epoch 73/200
56/56 - 0s - 2ms/step - AUC: 0.9665 - accuracy: 0.9127 - loss: 0.2296 - val_AUC: 0.9654 - val_accuracy: 0.9146 - val_loss: 0.2433
Epoch 74/200
56/56 - 0s - 2ms/step - AUC: 0.9646 - accuracy: 0.9087 - loss: 0.2413 - val_AUC: 0.9655 - val_accuracy: 0.9132 - val_loss: 0.2432
Epoch 75/200
56/56 - 0s - 2ms/step - AUC: 0.9654 - accuracy: 0.9116 - loss: 0.2365 - val_AUC: 0.9653 - val_accuracy: 0.9146 - val_loss: 0.2436
Epoch 76/200
56/56 - 0s - 2ms/step - AUC: 0.9647 - accuracy: 0.9105 - loss: 0.2357 - val_AUC: 0.9658 - val_accuracy: 0.9146 - val_loss: 0.2420
Epoch 77/200
56/56 - 0s - 2ms/step - AUC: 0.9608 - accuracy: 0.9076 - loss: 0.2577 - val_AUC: 0.9660 - val_accuracy: 0.9146 - val_loss: 0.2419
Epoch 78/200
56/56 - 0s - 2ms/step - AUC: 0.9635 - accuracy: 0.9072 - loss: 0.2410 - val_AUC: 0.9658 - val_accuracy: 0.9132 - val_loss: 0.2419
Epoch 79/200
56/56 - 0s - 3ms/step - AUC: 0.9646 - accuracy: 0.9149 - loss: 0.2413 - val_AUC: 0.9658 - val_accuracy: 0.9132 - val_loss: 0.2415
Epoch 80/200
56/56 - 0s - 2ms/step - AUC: 0.9654 - accuracy: 0.9123 - loss: 0.2382 - val_AUC: 0.9659 - val_accuracy: 0.9132 - val_loss: 0.2418
Epoch 81/200
56/56 - 0s - 2ms/step - AUC: 0.9668 - accuracy: 0.9098 - loss: 0.2350 - val_AUC: 0.9659 - val_accuracy: 0.9146 - val_loss: 0.2407
Epoch 82/200
56/56 - 0s - 3ms/step - AUC: 0.9647 - accuracy: 0.9083 - loss: 0.2366 - val_AUC: 0.9660 - val_accuracy: 0.9132 - val_loss: 0.2414
Epoch 83/200
56/56 - 0s - 2ms/step - AUC: 0.9631 - accuracy: 0.9083 - loss: 0.2420 - val_AUC: 0.9659 - val_accuracy: 0.9117 - val_loss: 0.2417
Epoch 84/200
56/56 - 0s - 2ms/step - AUC: 0.9665 - accuracy: 0.9076 - loss: 0.2325 - val_AUC: 0.9657 - val_accuracy: 0.9146 - val_loss: 0.2421
Epoch 85/200
56/56 - 0s - 2ms/step - AUC: 0.9650 - accuracy: 0.9072 - loss: 0.2389 - val_AUC: 0.9657 - val_accuracy: 0.9132 - val_loss: 0.2429
Epoch 86/200
56/56 - 0s - 2ms/step - AUC: 0.9634 - accuracy: 0.9109 - loss: 0.2436 - val_AUC: 0.9660 - val_accuracy: 0.9146 - val_loss: 0.2412
Epoch 87/200
56/56 - 0s - 2ms/step - AUC: 0.9655 - accuracy: 0.9116 - loss: 0.2405 - val_AUC: 0.9658 - val_accuracy: 0.9132 - val_loss: 0.2414
Epoch 88/200
56/56 - 0s - 2ms/step - AUC: 0.9673 - accuracy: 0.9134 - loss: 0.2297 - val_AUC: 0.9658 - val_accuracy: 0.9132 - val_loss: 0.2416
Epoch 89/200
56/56 - 0s - 2ms/step - AUC: 0.9661 - accuracy: 0.9105 - loss: 0.2345 - val_AUC: 0.9658 - val_accuracy: 0.9132 - val_loss: 0.2412
Epoch 90/200
56/56 - 0s - 2ms/step - AUC: 0.9645 - accuracy: 0.9145 - loss: 0.2400 - val_AUC: 0.9654 - val_accuracy: 0.9146 - val_loss: 0.2416
Epoch 91/200
56/56 - 0s - 2ms/step - AUC: 0.9656 - accuracy: 0.9087 - loss: 0.2351 - val_AUC: 0.9659 - val_accuracy: 0.9132 - val_loss: 0.2404
Epoch 92/200
56/56 - 0s - 2ms/step - AUC: 0.9685 - accuracy: 0.9138 - loss: 0.2252 - val_AUC: 0.9659 - val_accuracy: 0.9132 - val_loss: 0.2411
Epoch 93/200
56/56 - 0s - 3ms/step - AUC: 0.9644 - accuracy: 0.9105 - loss: 0.2414 - val_AUC: 0.9655 - val_accuracy: 0.9161 - val_loss: 0.2422
Epoch 94/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9149 - loss: 0.2281 - val_AUC: 0.9657 - val_accuracy: 0.9132 - val_loss: 0.2399
Epoch 95/200
56/56 - 0s - 3ms/step - AUC: 0.9646 - accuracy: 0.9098 - loss: 0.2327 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2394
Epoch 96/200
56/56 - 0s - 2ms/step - AUC: 0.9646 - accuracy: 0.9105 - loss: 0.2436 - val_AUC: 0.9659 - val_accuracy: 0.9146 - val_loss: 0.2386
Epoch 97/200
56/56 - 0s - 3ms/step - AUC: 0.9662 - accuracy: 0.9167 - loss: 0.2394 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2366
Epoch 98/200
56/56 - 0s - 2ms/step - AUC: 0.9689 - accuracy: 0.9196 - loss: 0.2212 - val_AUC: 0.9662 - val_accuracy: 0.9146 - val_loss: 0.2366
Epoch 99/200
56/56 - 0s - 2ms/step - AUC: 0.9664 - accuracy: 0.9138 - loss: 0.2337 - val_AUC: 0.9665 - val_accuracy: 0.9132 - val_loss: 0.2363
Epoch 100/200
56/56 - 0s - 2ms/step - AUC: 0.9668 - accuracy: 0.9159 - loss: 0.2350 - val_AUC: 0.9663 - val_accuracy: 0.9132 - val_loss: 0.2357
Epoch 101/200
56/56 - 0s - 2ms/step - AUC: 0.9670 - accuracy: 0.9196 - loss: 0.2257 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2373
Epoch 102/200
56/56 - 0s - 2ms/step - AUC: 0.9646 - accuracy: 0.9105 - loss: 0.2401 - val_AUC: 0.9663 - val_accuracy: 0.9146 - val_loss: 0.2357
Epoch 103/200
56/56 - 0s - 2ms/step - AUC: 0.9647 - accuracy: 0.9199 - loss: 0.2360 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2355
Epoch 104/200
56/56 - 0s - 2ms/step - AUC: 0.9665 - accuracy: 0.9181 - loss: 0.2372 - val_AUC: 0.9660 - val_accuracy: 0.9146 - val_loss: 0.2356
Epoch 105/200
56/56 - 0s - 2ms/step - AUC: 0.9672 - accuracy: 0.9188 - loss: 0.2310 - val_AUC: 0.9663 - val_accuracy: 0.9146 - val_loss: 0.2353
Epoch 106/200
56/56 - 0s - 2ms/step - AUC: 0.9672 - accuracy: 0.9156 - loss: 0.2269 - val_AUC: 0.9665 - val_accuracy: 0.9146 - val_loss: 0.2339
Epoch 107/200
56/56 - 0s - 2ms/step - AUC: 0.9666 - accuracy: 0.9141 - loss: 0.2276 - val_AUC: 0.9664 - val_accuracy: 0.9161 - val_loss: 0.2347
Epoch 108/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9181 - loss: 0.2248 - val_AUC: 0.9663 - val_accuracy: 0.9146 - val_loss: 0.2359
Epoch 109/200
56/56 - 0s - 2ms/step - AUC: 0.9672 - accuracy: 0.9203 - loss: 0.2279 - val_AUC: 0.9661 - val_accuracy: 0.9175 - val_loss: 0.2366
Epoch 110/200
56/56 - 0s - 2ms/step - AUC: 0.9665 - accuracy: 0.9174 - loss: 0.2290 - val_AUC: 0.9666 - val_accuracy: 0.9175 - val_loss: 0.2357
Epoch 111/200
56/56 - 0s - 2ms/step - AUC: 0.9670 - accuracy: 0.9210 - loss: 0.2303 - val_AUC: 0.9669 - val_accuracy: 0.9161 - val_loss: 0.2342
Epoch 112/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9163 - loss: 0.2255 - val_AUC: 0.9669 - val_accuracy: 0.9190 - val_loss: 0.2332
Epoch 113/200
56/56 - 0s - 2ms/step - AUC: 0.9696 - accuracy: 0.9228 - loss: 0.2178 - val_AUC: 0.9668 - val_accuracy: 0.9175 - val_loss: 0.2343
Epoch 114/200
56/56 - 0s - 2ms/step - AUC: 0.9673 - accuracy: 0.9210 - loss: 0.2261 - val_AUC: 0.9667 - val_accuracy: 0.9161 - val_loss: 0.2349
Epoch 115/200
56/56 - 0s - 2ms/step - AUC: 0.9654 - accuracy: 0.9167 - loss: 0.2380 - val_AUC: 0.9667 - val_accuracy: 0.9175 - val_loss: 0.2344
Epoch 116/200
56/56 - 0s - 2ms/step - AUC: 0.9692 - accuracy: 0.9185 - loss: 0.2220 - val_AUC: 0.9668 - val_accuracy: 0.9175 - val_loss: 0.2337
Epoch 117/200
56/56 - 0s - 2ms/step - AUC: 0.9653 - accuracy: 0.9105 - loss: 0.2351 - val_AUC: 0.9674 - val_accuracy: 0.9204 - val_loss: 0.2330
Epoch 118/200
56/56 - 0s - 2ms/step - AUC: 0.9697 - accuracy: 0.9174 - loss: 0.2232 - val_AUC: 0.9669 - val_accuracy: 0.9204 - val_loss: 0.2330
Epoch 119/200
56/56 - 0s - 2ms/step - AUC: 0.9667 - accuracy: 0.9138 - loss: 0.2299 - val_AUC: 0.9669 - val_accuracy: 0.9233 - val_loss: 0.2334
Epoch 120/200
56/56 - 0s - 2ms/step - AUC: 0.9655 - accuracy: 0.9156 - loss: 0.2333 - val_AUC: 0.9670 - val_accuracy: 0.9219 - val_loss: 0.2336
Epoch 121/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9181 - loss: 0.2255 - val_AUC: 0.9669 - val_accuracy: 0.9204 - val_loss: 0.2340
Epoch 122/200
56/56 - 0s - 2ms/step - AUC: 0.9655 - accuracy: 0.9203 - loss: 0.2344 - val_AUC: 0.9671 - val_accuracy: 0.9204 - val_loss: 0.2330
Epoch 123/200
56/56 - 0s - 2ms/step - AUC: 0.9704 - accuracy: 0.9261 - loss: 0.2191 - val_AUC: 0.9673 - val_accuracy: 0.9219 - val_loss: 0.2325
Epoch 124/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9167 - loss: 0.2249 - val_AUC: 0.9675 - val_accuracy: 0.9204 - val_loss: 0.2319
Epoch 125/200
56/56 - 0s - 2ms/step - AUC: 0.9664 - accuracy: 0.9203 - loss: 0.2303 - val_AUC: 0.9676 - val_accuracy: 0.9204 - val_loss: 0.2305
Epoch 126/200
56/56 - 0s - 2ms/step - AUC: 0.9678 - accuracy: 0.9225 - loss: 0.2266 - val_AUC: 0.9672 - val_accuracy: 0.9175 - val_loss: 0.2309
Epoch 127/200
56/56 - 0s - 2ms/step - AUC: 0.9664 - accuracy: 0.9178 - loss: 0.2288 - val_AUC: 0.9672 - val_accuracy: 0.9190 - val_loss: 0.2313
Epoch 128/200
56/56 - 0s - 2ms/step - AUC: 0.9653 - accuracy: 0.9225 - loss: 0.2320 - val_AUC: 0.9677 - val_accuracy: 0.9190 - val_loss: 0.2312
Epoch 129/200
56/56 - 0s - 2ms/step - AUC: 0.9704 - accuracy: 0.9246 - loss: 0.2190 - val_AUC: 0.9679 - val_accuracy: 0.9190 - val_loss: 0.2304
Epoch 130/200
56/56 - 0s - 2ms/step - AUC: 0.9671 - accuracy: 0.9196 - loss: 0.2230 - val_AUC: 0.9680 - val_accuracy: 0.9204 - val_loss: 0.2304
Epoch 131/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9272 - loss: 0.2243 - val_AUC: 0.9678 - val_accuracy: 0.9204 - val_loss: 0.2309
Epoch 132/200
56/56 - 0s - 2ms/step - AUC: 0.9687 - accuracy: 0.9181 - loss: 0.2227 - val_AUC: 0.9677 - val_accuracy: 0.9204 - val_loss: 0.2327
Epoch 133/200
56/56 - 0s - 2ms/step - AUC: 0.9659 - accuracy: 0.9163 - loss: 0.2347 - val_AUC: 0.9674 - val_accuracy: 0.9204 - val_loss: 0.2304
Epoch 134/200
56/56 - 0s - 2ms/step - AUC: 0.9695 - accuracy: 0.9228 - loss: 0.2240 - val_AUC: 0.9678 - val_accuracy: 0.9175 - val_loss: 0.2303
Epoch 135/200
56/56 - 0s - 2ms/step - AUC: 0.9671 - accuracy: 0.9174 - loss: 0.2279 - val_AUC: 0.9678 - val_accuracy: 0.9190 - val_loss: 0.2295
Epoch 136/200
56/56 - 0s - 2ms/step - AUC: 0.9697 - accuracy: 0.9221 - loss: 0.2198 - val_AUC: 0.9680 - val_accuracy: 0.9190 - val_loss: 0.2289
Epoch 137/200
56/56 - 0s - 2ms/step - AUC: 0.9640 - accuracy: 0.9181 - loss: 0.2391 - val_AUC: 0.9683 - val_accuracy: 0.9204 - val_loss: 0.2276
Epoch 138/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9221 - loss: 0.2265 - val_AUC: 0.9678 - val_accuracy: 0.9190 - val_loss: 0.2277
Epoch 139/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9203 - loss: 0.2261 - val_AUC: 0.9676 - val_accuracy: 0.9190 - val_loss: 0.2275
Epoch 140/200
56/56 - 0s - 2ms/step - AUC: 0.9671 - accuracy: 0.9192 - loss: 0.2290 - val_AUC: 0.9679 - val_accuracy: 0.9204 - val_loss: 0.2291
Epoch 141/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9221 - loss: 0.2262 - val_AUC: 0.9683 - val_accuracy: 0.9204 - val_loss: 0.2273
Epoch 142/200
56/56 - 0s - 2ms/step - AUC: 0.9710 - accuracy: 0.9221 - loss: 0.2184 - val_AUC: 0.9681 - val_accuracy: 0.9190 - val_loss: 0.2274
Epoch 143/200
56/56 - 0s - 2ms/step - AUC: 0.9692 - accuracy: 0.9243 - loss: 0.2188 - val_AUC: 0.9681 - val_accuracy: 0.9190 - val_loss: 0.2280
Epoch 144/200
56/56 - 0s - 2ms/step - AUC: 0.9658 - accuracy: 0.9109 - loss: 0.2320 - val_AUC: 0.9682 - val_accuracy: 0.9175 - val_loss: 0.2269
Epoch 145/200
56/56 - 0s - 2ms/step - AUC: 0.9665 - accuracy: 0.9192 - loss: 0.2295 - val_AUC: 0.9681 - val_accuracy: 0.9204 - val_loss: 0.2282
Epoch 146/200
56/56 - 0s - 3ms/step - AUC: 0.9694 - accuracy: 0.9203 - loss: 0.2207 - val_AUC: 0.9684 - val_accuracy: 0.9219 - val_loss: 0.2261
Epoch 147/200
56/56 - 0s - 2ms/step - AUC: 0.9698 - accuracy: 0.9236 - loss: 0.2148 - val_AUC: 0.9686 - val_accuracy: 0.9219 - val_loss: 0.2246
Epoch 148/200
56/56 - 0s - 2ms/step - AUC: 0.9695 - accuracy: 0.9210 - loss: 0.2207 - val_AUC: 0.9683 - val_accuracy: 0.9204 - val_loss: 0.2246
Epoch 149/200
56/56 - 0s - 2ms/step - AUC: 0.9662 - accuracy: 0.9159 - loss: 0.2282 - val_AUC: 0.9682 - val_accuracy: 0.9204 - val_loss: 0.2256
Epoch 150/200
56/56 - 0s - 2ms/step - AUC: 0.9690 - accuracy: 0.9236 - loss: 0.2171 - val_AUC: 0.9686 - val_accuracy: 0.9219 - val_loss: 0.2244
Epoch 151/200
56/56 - 0s - 2ms/step - AUC: 0.9683 - accuracy: 0.9199 - loss: 0.2236 - val_AUC: 0.9686 - val_accuracy: 0.9204 - val_loss: 0.2256
Epoch 152/200
56/56 - 0s - 2ms/step - AUC: 0.9692 - accuracy: 0.9239 - loss: 0.2174 - val_AUC: 0.9687 - val_accuracy: 0.9204 - val_loss: 0.2238
Epoch 153/200
56/56 - 0s - 2ms/step - AUC: 0.9681 - accuracy: 0.9254 - loss: 0.2259 - val_AUC: 0.9683 - val_accuracy: 0.9190 - val_loss: 0.2259
Epoch 154/200
56/56 - 0s - 2ms/step - AUC: 0.9686 - accuracy: 0.9181 - loss: 0.2222 - val_AUC: 0.9685 - val_accuracy: 0.9204 - val_loss: 0.2242
Epoch 155/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9185 - loss: 0.2223 - val_AUC: 0.9683 - val_accuracy: 0.9190 - val_loss: 0.2261
Epoch 156/200
56/56 - 0s - 2ms/step - AUC: 0.9706 - accuracy: 0.9236 - loss: 0.2119 - val_AUC: 0.9683 - val_accuracy: 0.9204 - val_loss: 0.2255
Epoch 157/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9185 - loss: 0.2255 - val_AUC: 0.9688 - val_accuracy: 0.9219 - val_loss: 0.2256
Epoch 158/200
56/56 - 0s - 2ms/step - AUC: 0.9684 - accuracy: 0.9207 - loss: 0.2198 - val_AUC: 0.9690 - val_accuracy: 0.9204 - val_loss: 0.2251
Epoch 159/200
56/56 - 0s - 2ms/step - AUC: 0.9692 - accuracy: 0.9290 - loss: 0.2183 - val_AUC: 0.9692 - val_accuracy: 0.9204 - val_loss: 0.2229
Epoch 160/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9214 - loss: 0.2251 - val_AUC: 0.9691 - val_accuracy: 0.9190 - val_loss: 0.2233
Epoch 161/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9214 - loss: 0.2254 - val_AUC: 0.9689 - val_accuracy: 0.9190 - val_loss: 0.2239
Epoch 162/200
56/56 - 0s - 2ms/step - AUC: 0.9714 - accuracy: 0.9221 - loss: 0.2175 - val_AUC: 0.9689 - val_accuracy: 0.9190 - val_loss: 0.2236
Epoch 163/200
56/56 - 0s - 2ms/step - AUC: 0.9673 - accuracy: 0.9239 - loss: 0.2248 - val_AUC: 0.9690 - val_accuracy: 0.9190 - val_loss: 0.2236
Epoch 164/200
56/56 - 0s - 2ms/step - AUC: 0.9679 - accuracy: 0.9217 - loss: 0.2205 - val_AUC: 0.9693 - val_accuracy: 0.9219 - val_loss: 0.2219
Epoch 165/200
56/56 - 0s - 2ms/step - AUC: 0.9666 - accuracy: 0.9214 - loss: 0.2314 - val_AUC: 0.9693 - val_accuracy: 0.9204 - val_loss: 0.2220
Epoch 166/200
56/56 - 0s - 2ms/step - AUC: 0.9687 - accuracy: 0.9188 - loss: 0.2256 - val_AUC: 0.9693 - val_accuracy: 0.9190 - val_loss: 0.2209
Epoch 167/200
56/56 - 0s - 2ms/step - AUC: 0.9705 - accuracy: 0.9297 - loss: 0.2127 - val_AUC: 0.9694 - val_accuracy: 0.9219 - val_loss: 0.2198
Epoch 168/200
56/56 - 0s - 2ms/step - AUC: 0.9719 - accuracy: 0.9257 - loss: 0.2076 - val_AUC: 0.9692 - val_accuracy: 0.9190 - val_loss: 0.2215
Epoch 169/200
56/56 - 0s - 2ms/step - AUC: 0.9697 - accuracy: 0.9214 - loss: 0.2178 - val_AUC: 0.9693 - val_accuracy: 0.9204 - val_loss: 0.2219
Epoch 170/200
56/56 - 0s - 2ms/step - AUC: 0.9682 - accuracy: 0.9185 - loss: 0.2184 - val_AUC: 0.9692 - val_accuracy: 0.9190 - val_loss: 0.2228
Epoch 171/200
56/56 - 0s - 2ms/step - AUC: 0.9699 - accuracy: 0.9254 - loss: 0.2191 - val_AUC: 0.9691 - val_accuracy: 0.9175 - val_loss: 0.2229
Epoch 172/200
56/56 - 0s - 2ms/step - AUC: 0.9670 - accuracy: 0.9134 - loss: 0.2320 - val_AUC: 0.9693 - val_accuracy: 0.9161 - val_loss: 0.2207
Epoch 173/200
56/56 - 0s - 2ms/step - AUC: 0.9681 - accuracy: 0.9167 - loss: 0.2241 - val_AUC: 0.9691 - val_accuracy: 0.9161 - val_loss: 0.2204
Epoch 174/200
56/56 - 0s - 2ms/step - AUC: 0.9692 - accuracy: 0.9199 - loss: 0.2184 - val_AUC: 0.9696 - val_accuracy: 0.9190 - val_loss: 0.2187
Epoch 175/200
56/56 - 0s - 2ms/step - AUC: 0.9697 - accuracy: 0.9192 - loss: 0.2171 - val_AUC: 0.9696 - val_accuracy: 0.9190 - val_loss: 0.2173
Epoch 176/200
56/56 - 0s - 2ms/step - AUC: 0.9701 - accuracy: 0.9192 - loss: 0.2135 - val_AUC: 0.9697 - val_accuracy: 0.9190 - val_loss: 0.2189
Epoch 177/200
56/56 - 0s - 2ms/step - AUC: 0.9726 - accuracy: 0.9254 - loss: 0.2093 - val_AUC: 0.9695 - val_accuracy: 0.9175 - val_loss: 0.2198
Epoch 178/200
56/56 - 0s - 2ms/step - AUC: 0.9667 - accuracy: 0.9207 - loss: 0.2249 - val_AUC: 0.9696 - val_accuracy: 0.9175 - val_loss: 0.2196
Epoch 179/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9196 - loss: 0.2223 - val_AUC: 0.9697 - val_accuracy: 0.9161 - val_loss: 0.2196
Epoch 180/200
56/56 - 0s - 2ms/step - AUC: 0.9711 - accuracy: 0.9225 - loss: 0.2144 - val_AUC: 0.9692 - val_accuracy: 0.9175 - val_loss: 0.2204
Epoch 181/200
56/56 - 0s - 2ms/step - AUC: 0.9707 - accuracy: 0.9312 - loss: 0.2145 - val_AUC: 0.9696 - val_accuracy: 0.9204 - val_loss: 0.2176
Epoch 182/200
56/56 - 0s - 2ms/step - AUC: 0.9681 - accuracy: 0.9178 - loss: 0.2209 - val_AUC: 0.9693 - val_accuracy: 0.9190 - val_loss: 0.2196
Epoch 183/200
56/56 - 0s - 2ms/step - AUC: 0.9676 - accuracy: 0.9203 - loss: 0.2240 - val_AUC: 0.9695 - val_accuracy: 0.9175 - val_loss: 0.2192
Epoch 184/200
56/56 - 0s - 2ms/step - AUC: 0.9683 - accuracy: 0.9239 - loss: 0.2249 - val_AUC: 0.9699 - val_accuracy: 0.9204 - val_loss: 0.2174
Epoch 185/200
56/56 - 0s - 2ms/step - AUC: 0.9690 - accuracy: 0.9232 - loss: 0.2186 - val_AUC: 0.9697 - val_accuracy: 0.9204 - val_loss: 0.2181
plot(model_fit)

suppressMessages(library(pROC))
# Evaluating on training data
results_train_model <- model %>% evaluate(X_train, y_train)
108/108 - 0s - 943us/step - AUC: 0.9782 - accuracy: 0.9386 - loss: 0.1795
print(results_train_model)
$AUC
[1] 0.9782313

$accuracy
[1] 0.9385685

$loss
[1] 0.1795432
results_test_model <- model %>% evaluate(X_test, y_test)
36/36 - 0s - 1ms/step - AUC: 0.9721 - accuracy: 0.9296 - loss: 0.1938
print(results_test_model)
$AUC
[1] 0.9720839

$accuracy
[1] 0.9295652

$loss
[1] 0.1937838
# Prediction test data
y_prob_hat_test <- model %>% predict(X_test)
36/36 - 0s - 3ms/step
threshold <- 0.5 # Predict spam if probability > threshold
y_hat_test <- as.factor(y_prob_hat_test > threshold)
levels(y_hat_test) <- c("not spam", "spam")
test_spam <- as.factor(test[, 1])
levels(test_spam) <- c("not spam", "spam")
confusionMatrix(data = y_hat_test, test_spam, positive = "spam")
Confusion Matrix and Statistics

          Reference
Prediction not spam spam
  not spam      669   51
  spam           30  400
                                          
               Accuracy : 0.9296          
                 95% CI : (0.9132, 0.9437)
    No Information Rate : 0.6078          
    P-Value [Acc > NIR] : < 2e-16         
                                          
                  Kappa : 0.851           
                                          
 Mcnemar's Test P-Value : 0.02627         
                                          
            Sensitivity : 0.8869          
            Specificity : 0.9571          
         Pos Pred Value : 0.9302          
         Neg Pred Value : 0.9292          
             Prevalence : 0.3922          
         Detection Rate : 0.3478          
   Detection Prevalence : 0.3739          
      Balanced Accuracy : 0.9220          
                                          
       'Positive' Class : spam            
                                          
# ROC curve
par(pty="s")
roc_obj <- roc(response = test_spam, predictor = as.vector(y_prob_hat_test), print.auc = TRUE, percent=TRUE)
Setting levels: control = not spam, case = spam
Setting direction: controls < cases
plot(roc_obj, legacy.axes = TRUE, percent=TRUE, col = "cornflowerblue", main = "ROC spam email classifiers", print.auc=TRUE, print.auc.pattern = "AUC: %0.3f%%", auc.polygon=TRUE)

Second, we construct 1-layer dense model with 8 hidden units, keeping the ADAM optimiser and almost same settings as previous model in order to make comparison as follows:

# Construct 1-layer dense model 
# Define the model 
model_1 <- keras_model_sequential() 
model_1 %>% 
  # Add first hidden layer with 8 hidden units
  layer_dense(units = 8, activation = 'relu', input_shape = c(15)) %>% 
  # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.3) %>% 
  # Add layer that connects to the observations
  layer_dense(units = 1, activation = 'sigmoid')
summary(model_1)
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_3 (Dense)                   │ (None, 8)                │           128 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_2 (Dropout)               │ (None, 8)                │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_4 (Dense)                   │ (None, 1)                │             9 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 137 (548.00 B)
 Trainable params: 137 (548.00 B)
 Non-trainable params: 0 (0.00 B)
# Set early stopping 
early_stopping <- callback_early_stopping(monitor="val_loss", patience = 10, restore_best_weights = TRUE)

# Compile model
model_1 %>% compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = c('accuracy', 'AUC'))

# Fit model
model_fit <- model_1 %>% fit(X_train, y_train, epochs = 200, batch_size = 50, validation_split = 0.2, callbacks = list(early_stopping))
Epoch 1/200
56/56 - 2s - 28ms/step - AUC: 0.5204 - accuracy: 0.5362 - loss: 0.6771 - val_AUC: 0.5876 - val_accuracy: 0.6483 - val_loss: 0.6524
Epoch 2/200
56/56 - 0s - 2ms/step - AUC: 0.6742 - accuracy: 0.6395 - loss: 0.6286 - val_AUC: 0.7566 - val_accuracy: 0.6657 - val_loss: 0.6030
Epoch 3/200
56/56 - 0s - 3ms/step - AUC: 0.8006 - accuracy: 0.7011 - loss: 0.5796 - val_AUC: 0.8775 - val_accuracy: 0.7091 - val_loss: 0.5491
Epoch 4/200
56/56 - 0s - 2ms/step - AUC: 0.8804 - accuracy: 0.7634 - loss: 0.5227 - val_AUC: 0.9337 - val_accuracy: 0.7815 - val_loss: 0.4867
Epoch 5/200
56/56 - 0s - 2ms/step - AUC: 0.9135 - accuracy: 0.8040 - loss: 0.4679 - val_AUC: 0.9421 - val_accuracy: 0.8408 - val_loss: 0.4307
Epoch 6/200
56/56 - 0s - 2ms/step - AUC: 0.9320 - accuracy: 0.8518 - loss: 0.4132 - val_AUC: 0.9443 - val_accuracy: 0.8698 - val_loss: 0.3881
Epoch 7/200
56/56 - 0s - 2ms/step - AUC: 0.9380 - accuracy: 0.8641 - loss: 0.3796 - val_AUC: 0.9482 - val_accuracy: 0.8741 - val_loss: 0.3587
Epoch 8/200
56/56 - 0s - 2ms/step - AUC: 0.9405 - accuracy: 0.8772 - loss: 0.3513 - val_AUC: 0.9500 - val_accuracy: 0.8755 - val_loss: 0.3387
Epoch 9/200
56/56 - 0s - 2ms/step - AUC: 0.9441 - accuracy: 0.8855 - loss: 0.3382 - val_AUC: 0.9512 - val_accuracy: 0.8813 - val_loss: 0.3232
Epoch 10/200
56/56 - 0s - 2ms/step - AUC: 0.9454 - accuracy: 0.8812 - loss: 0.3207 - val_AUC: 0.9526 - val_accuracy: 0.8813 - val_loss: 0.3117
Epoch 11/200
56/56 - 0s - 2ms/step - AUC: 0.9483 - accuracy: 0.8873 - loss: 0.3090 - val_AUC: 0.9538 - val_accuracy: 0.8871 - val_loss: 0.3031
Epoch 12/200
56/56 - 0s - 2ms/step - AUC: 0.9451 - accuracy: 0.8837 - loss: 0.3100 - val_AUC: 0.9547 - val_accuracy: 0.8871 - val_loss: 0.2974
Epoch 13/200
56/56 - 0s - 2ms/step - AUC: 0.9531 - accuracy: 0.8957 - loss: 0.2878 - val_AUC: 0.9553 - val_accuracy: 0.8871 - val_loss: 0.2935
Epoch 14/200
56/56 - 0s - 2ms/step - AUC: 0.9510 - accuracy: 0.8949 - loss: 0.2916 - val_AUC: 0.9560 - val_accuracy: 0.8886 - val_loss: 0.2893
Epoch 15/200
56/56 - 0s - 2ms/step - AUC: 0.9512 - accuracy: 0.8982 - loss: 0.2864 - val_AUC: 0.9564 - val_accuracy: 0.8886 - val_loss: 0.2864
Epoch 16/200
56/56 - 0s - 2ms/step - AUC: 0.9547 - accuracy: 0.8982 - loss: 0.2784 - val_AUC: 0.9571 - val_accuracy: 0.8871 - val_loss: 0.2840
Epoch 17/200
56/56 - 0s - 2ms/step - AUC: 0.9542 - accuracy: 0.8986 - loss: 0.2744 - val_AUC: 0.9577 - val_accuracy: 0.8871 - val_loss: 0.2822
Epoch 18/200
56/56 - 0s - 2ms/step - AUC: 0.9564 - accuracy: 0.9011 - loss: 0.2736 - val_AUC: 0.9578 - val_accuracy: 0.8886 - val_loss: 0.2805
Epoch 19/200
56/56 - 0s - 2ms/step - AUC: 0.9566 - accuracy: 0.9000 - loss: 0.2677 - val_AUC: 0.9582 - val_accuracy: 0.8871 - val_loss: 0.2788
Epoch 20/200
56/56 - 0s - 2ms/step - AUC: 0.9533 - accuracy: 0.9000 - loss: 0.2771 - val_AUC: 0.9580 - val_accuracy: 0.8857 - val_loss: 0.2775
Epoch 21/200
56/56 - 0s - 2ms/step - AUC: 0.9576 - accuracy: 0.9051 - loss: 0.2663 - val_AUC: 0.9582 - val_accuracy: 0.8871 - val_loss: 0.2764
Epoch 22/200
56/56 - 0s - 2ms/step - AUC: 0.9549 - accuracy: 0.9014 - loss: 0.2698 - val_AUC: 0.9582 - val_accuracy: 0.8871 - val_loss: 0.2750
Epoch 23/200
56/56 - 0s - 2ms/step - AUC: 0.9592 - accuracy: 0.9091 - loss: 0.2617 - val_AUC: 0.9587 - val_accuracy: 0.8929 - val_loss: 0.2740
Epoch 24/200
56/56 - 0s - 2ms/step - AUC: 0.9580 - accuracy: 0.9076 - loss: 0.2595 - val_AUC: 0.9589 - val_accuracy: 0.8958 - val_loss: 0.2725
Epoch 25/200
56/56 - 0s - 2ms/step - AUC: 0.9561 - accuracy: 0.9098 - loss: 0.2649 - val_AUC: 0.9592 - val_accuracy: 0.8958 - val_loss: 0.2712
Epoch 26/200
56/56 - 0s - 2ms/step - AUC: 0.9549 - accuracy: 0.9007 - loss: 0.2683 - val_AUC: 0.9592 - val_accuracy: 0.8973 - val_loss: 0.2708
Epoch 27/200
56/56 - 0s - 2ms/step - AUC: 0.9627 - accuracy: 0.9098 - loss: 0.2495 - val_AUC: 0.9595 - val_accuracy: 0.8973 - val_loss: 0.2701
Epoch 28/200
56/56 - 0s - 2ms/step - AUC: 0.9608 - accuracy: 0.9120 - loss: 0.2537 - val_AUC: 0.9598 - val_accuracy: 0.9001 - val_loss: 0.2694
Epoch 29/200
56/56 - 0s - 2ms/step - AUC: 0.9599 - accuracy: 0.9149 - loss: 0.2551 - val_AUC: 0.9599 - val_accuracy: 0.9016 - val_loss: 0.2690
Epoch 30/200
56/56 - 0s - 2ms/step - AUC: 0.9605 - accuracy: 0.9116 - loss: 0.2545 - val_AUC: 0.9600 - val_accuracy: 0.9001 - val_loss: 0.2683
Epoch 31/200
56/56 - 0s - 2ms/step - AUC: 0.9611 - accuracy: 0.9109 - loss: 0.2532 - val_AUC: 0.9600 - val_accuracy: 0.8987 - val_loss: 0.2679
Epoch 32/200
56/56 - 0s - 2ms/step - AUC: 0.9604 - accuracy: 0.9083 - loss: 0.2580 - val_AUC: 0.9599 - val_accuracy: 0.8987 - val_loss: 0.2672
Epoch 33/200
56/56 - 0s - 2ms/step - AUC: 0.9632 - accuracy: 0.9170 - loss: 0.2444 - val_AUC: 0.9602 - val_accuracy: 0.8987 - val_loss: 0.2666
Epoch 34/200
56/56 - 0s - 2ms/step - AUC: 0.9606 - accuracy: 0.9163 - loss: 0.2551 - val_AUC: 0.9604 - val_accuracy: 0.8987 - val_loss: 0.2655
Epoch 35/200
56/56 - 0s - 2ms/step - AUC: 0.9586 - accuracy: 0.9112 - loss: 0.2588 - val_AUC: 0.9603 - val_accuracy: 0.8987 - val_loss: 0.2652
Epoch 36/200
56/56 - 0s - 2ms/step - AUC: 0.9604 - accuracy: 0.9185 - loss: 0.2565 - val_AUC: 0.9604 - val_accuracy: 0.8987 - val_loss: 0.2643
Epoch 37/200
56/56 - 0s - 3ms/step - AUC: 0.9618 - accuracy: 0.9087 - loss: 0.2476 - val_AUC: 0.9606 - val_accuracy: 0.8973 - val_loss: 0.2638
Epoch 38/200
56/56 - 0s - 2ms/step - AUC: 0.9627 - accuracy: 0.9159 - loss: 0.2482 - val_AUC: 0.9603 - val_accuracy: 0.9030 - val_loss: 0.2636
Epoch 39/200
56/56 - 0s - 2ms/step - AUC: 0.9607 - accuracy: 0.9087 - loss: 0.2494 - val_AUC: 0.9604 - val_accuracy: 0.9030 - val_loss: 0.2634
Epoch 40/200
56/56 - 0s - 2ms/step - AUC: 0.9626 - accuracy: 0.9101 - loss: 0.2458 - val_AUC: 0.9607 - val_accuracy: 0.9030 - val_loss: 0.2632
Epoch 41/200
56/56 - 0s - 2ms/step - AUC: 0.9627 - accuracy: 0.9130 - loss: 0.2461 - val_AUC: 0.9610 - val_accuracy: 0.9030 - val_loss: 0.2626
Epoch 42/200
56/56 - 0s - 2ms/step - AUC: 0.9614 - accuracy: 0.9163 - loss: 0.2476 - val_AUC: 0.9611 - val_accuracy: 0.9016 - val_loss: 0.2621
Epoch 43/200
56/56 - 0s - 2ms/step - AUC: 0.9638 - accuracy: 0.9145 - loss: 0.2421 - val_AUC: 0.9609 - val_accuracy: 0.9045 - val_loss: 0.2620
Epoch 44/200
56/56 - 0s - 2ms/step - AUC: 0.9630 - accuracy: 0.9174 - loss: 0.2465 - val_AUC: 0.9611 - val_accuracy: 0.9045 - val_loss: 0.2615
Epoch 45/200
56/56 - 0s - 2ms/step - AUC: 0.9635 - accuracy: 0.9159 - loss: 0.2445 - val_AUC: 0.9612 - val_accuracy: 0.9045 - val_loss: 0.2612
Epoch 46/200
56/56 - 0s - 2ms/step - AUC: 0.9619 - accuracy: 0.9141 - loss: 0.2484 - val_AUC: 0.9611 - val_accuracy: 0.9059 - val_loss: 0.2607
Epoch 47/200
56/56 - 0s - 2ms/step - AUC: 0.9632 - accuracy: 0.9178 - loss: 0.2436 - val_AUC: 0.9615 - val_accuracy: 0.9059 - val_loss: 0.2606
Epoch 48/200
56/56 - 0s - 2ms/step - AUC: 0.9636 - accuracy: 0.9130 - loss: 0.2415 - val_AUC: 0.9614 - val_accuracy: 0.9059 - val_loss: 0.2604
Epoch 49/200
56/56 - 0s - 2ms/step - AUC: 0.9649 - accuracy: 0.9112 - loss: 0.2391 - val_AUC: 0.9616 - val_accuracy: 0.9059 - val_loss: 0.2600
Epoch 50/200
56/56 - 0s - 2ms/step - AUC: 0.9666 - accuracy: 0.9127 - loss: 0.2356 - val_AUC: 0.9615 - val_accuracy: 0.9059 - val_loss: 0.2597
Epoch 51/200
56/56 - 0s - 2ms/step - AUC: 0.9630 - accuracy: 0.9152 - loss: 0.2480 - val_AUC: 0.9614 - val_accuracy: 0.9045 - val_loss: 0.2596
Epoch 52/200
56/56 - 0s - 2ms/step - AUC: 0.9632 - accuracy: 0.9163 - loss: 0.2459 - val_AUC: 0.9615 - val_accuracy: 0.9059 - val_loss: 0.2593
Epoch 53/200
56/56 - 0s - 2ms/step - AUC: 0.9642 - accuracy: 0.9120 - loss: 0.2422 - val_AUC: 0.9616 - val_accuracy: 0.9059 - val_loss: 0.2594
Epoch 54/200
56/56 - 0s - 2ms/step - AUC: 0.9645 - accuracy: 0.9174 - loss: 0.2389 - val_AUC: 0.9617 - val_accuracy: 0.9059 - val_loss: 0.2591
Epoch 55/200
56/56 - 0s - 2ms/step - AUC: 0.9599 - accuracy: 0.9156 - loss: 0.2543 - val_AUC: 0.9619 - val_accuracy: 0.9059 - val_loss: 0.2590
Epoch 56/200
56/56 - 0s - 2ms/step - AUC: 0.9622 - accuracy: 0.9141 - loss: 0.2476 - val_AUC: 0.9619 - val_accuracy: 0.9059 - val_loss: 0.2590
Epoch 57/200
56/56 - 0s - 2ms/step - AUC: 0.9646 - accuracy: 0.9149 - loss: 0.2424 - val_AUC: 0.9620 - val_accuracy: 0.9045 - val_loss: 0.2585
Epoch 58/200
56/56 - 0s - 2ms/step - AUC: 0.9651 - accuracy: 0.9167 - loss: 0.2403 - val_AUC: 0.9619 - val_accuracy: 0.9059 - val_loss: 0.2583
Epoch 59/200
56/56 - 0s - 2ms/step - AUC: 0.9641 - accuracy: 0.9196 - loss: 0.2417 - val_AUC: 0.9621 - val_accuracy: 0.9059 - val_loss: 0.2578
Epoch 60/200
56/56 - 0s - 2ms/step - AUC: 0.9645 - accuracy: 0.9174 - loss: 0.2388 - val_AUC: 0.9620 - val_accuracy: 0.9059 - val_loss: 0.2571
Epoch 61/200
56/56 - 0s - 2ms/step - AUC: 0.9671 - accuracy: 0.9225 - loss: 0.2296 - val_AUC: 0.9619 - val_accuracy: 0.9059 - val_loss: 0.2567
Epoch 62/200
56/56 - 0s - 2ms/step - AUC: 0.9621 - accuracy: 0.9178 - loss: 0.2447 - val_AUC: 0.9622 - val_accuracy: 0.9059 - val_loss: 0.2561
Epoch 63/200
56/56 - 0s - 2ms/step - AUC: 0.9647 - accuracy: 0.9120 - loss: 0.2391 - val_AUC: 0.9623 - val_accuracy: 0.9074 - val_loss: 0.2563
Epoch 64/200
56/56 - 0s - 2ms/step - AUC: 0.9649 - accuracy: 0.9210 - loss: 0.2399 - val_AUC: 0.9625 - val_accuracy: 0.9059 - val_loss: 0.2561
Epoch 65/200
56/56 - 0s - 2ms/step - AUC: 0.9663 - accuracy: 0.9225 - loss: 0.2344 - val_AUC: 0.9624 - val_accuracy: 0.9059 - val_loss: 0.2562
Epoch 66/200
56/56 - 0s - 2ms/step - AUC: 0.9643 - accuracy: 0.9210 - loss: 0.2423 - val_AUC: 0.9621 - val_accuracy: 0.9059 - val_loss: 0.2562
Epoch 67/200
56/56 - 0s - 2ms/step - AUC: 0.9653 - accuracy: 0.9109 - loss: 0.2379 - val_AUC: 0.9622 - val_accuracy: 0.9045 - val_loss: 0.2559
Epoch 68/200
56/56 - 0s - 2ms/step - AUC: 0.9649 - accuracy: 0.9181 - loss: 0.2405 - val_AUC: 0.9621 - val_accuracy: 0.9045 - val_loss: 0.2558
Epoch 69/200
56/56 - 0s - 2ms/step - AUC: 0.9660 - accuracy: 0.9178 - loss: 0.2340 - val_AUC: 0.9620 - val_accuracy: 0.9059 - val_loss: 0.2552
Epoch 70/200
56/56 - 0s - 2ms/step - AUC: 0.9656 - accuracy: 0.9181 - loss: 0.2358 - val_AUC: 0.9620 - val_accuracy: 0.9045 - val_loss: 0.2546
Epoch 71/200
56/56 - 0s - 2ms/step - AUC: 0.9644 - accuracy: 0.9149 - loss: 0.2435 - val_AUC: 0.9621 - val_accuracy: 0.9059 - val_loss: 0.2541
Epoch 72/200
56/56 - 0s - 2ms/step - AUC: 0.9646 - accuracy: 0.9178 - loss: 0.2390 - val_AUC: 0.9622 - val_accuracy: 0.9045 - val_loss: 0.2541
Epoch 73/200
56/56 - 0s - 2ms/step - AUC: 0.9635 - accuracy: 0.9185 - loss: 0.2421 - val_AUC: 0.9623 - val_accuracy: 0.9074 - val_loss: 0.2537
Epoch 74/200
56/56 - 0s - 2ms/step - AUC: 0.9660 - accuracy: 0.9178 - loss: 0.2326 - val_AUC: 0.9622 - val_accuracy: 0.9059 - val_loss: 0.2532
Epoch 75/200
56/56 - 0s - 2ms/step - AUC: 0.9669 - accuracy: 0.9203 - loss: 0.2331 - val_AUC: 0.9625 - val_accuracy: 0.9074 - val_loss: 0.2532
Epoch 76/200
56/56 - 0s - 2ms/step - AUC: 0.9655 - accuracy: 0.9138 - loss: 0.2359 - val_AUC: 0.9624 - val_accuracy: 0.9088 - val_loss: 0.2530
Epoch 77/200
56/56 - 0s - 2ms/step - AUC: 0.9637 - accuracy: 0.9163 - loss: 0.2431 - val_AUC: 0.9623 - val_accuracy: 0.9103 - val_loss: 0.2529
Epoch 78/200
56/56 - 0s - 2ms/step - AUC: 0.9659 - accuracy: 0.9145 - loss: 0.2311 - val_AUC: 0.9626 - val_accuracy: 0.9103 - val_loss: 0.2526
Epoch 79/200
56/56 - 0s - 2ms/step - AUC: 0.9654 - accuracy: 0.9181 - loss: 0.2374 - val_AUC: 0.9629 - val_accuracy: 0.9117 - val_loss: 0.2524
Epoch 80/200
56/56 - 0s - 2ms/step - AUC: 0.9651 - accuracy: 0.9145 - loss: 0.2365 - val_AUC: 0.9628 - val_accuracy: 0.9103 - val_loss: 0.2517
Epoch 81/200
56/56 - 0s - 2ms/step - AUC: 0.9671 - accuracy: 0.9174 - loss: 0.2340 - val_AUC: 0.9628 - val_accuracy: 0.9117 - val_loss: 0.2514
Epoch 82/200
56/56 - 0s - 2ms/step - AUC: 0.9655 - accuracy: 0.9199 - loss: 0.2368 - val_AUC: 0.9629 - val_accuracy: 0.9132 - val_loss: 0.2512
Epoch 83/200
56/56 - 0s - 2ms/step - AUC: 0.9653 - accuracy: 0.9185 - loss: 0.2382 - val_AUC: 0.9633 - val_accuracy: 0.9117 - val_loss: 0.2512
Epoch 84/200
56/56 - 0s - 2ms/step - AUC: 0.9678 - accuracy: 0.9188 - loss: 0.2311 - val_AUC: 0.9633 - val_accuracy: 0.9132 - val_loss: 0.2513
Epoch 85/200
56/56 - 0s - 2ms/step - AUC: 0.9665 - accuracy: 0.9217 - loss: 0.2360 - val_AUC: 0.9634 - val_accuracy: 0.9132 - val_loss: 0.2510
Epoch 86/200
56/56 - 0s - 2ms/step - AUC: 0.9642 - accuracy: 0.9152 - loss: 0.2425 - val_AUC: 0.9634 - val_accuracy: 0.9132 - val_loss: 0.2511
Epoch 87/200
56/56 - 0s - 2ms/step - AUC: 0.9658 - accuracy: 0.9199 - loss: 0.2355 - val_AUC: 0.9631 - val_accuracy: 0.9117 - val_loss: 0.2509
Epoch 88/200
56/56 - 0s - 2ms/step - AUC: 0.9667 - accuracy: 0.9159 - loss: 0.2335 - val_AUC: 0.9633 - val_accuracy: 0.9146 - val_loss: 0.2511
Epoch 89/200
56/56 - 0s - 2ms/step - AUC: 0.9650 - accuracy: 0.9181 - loss: 0.2371 - val_AUC: 0.9633 - val_accuracy: 0.9132 - val_loss: 0.2509
Epoch 90/200
56/56 - 0s - 2ms/step - AUC: 0.9664 - accuracy: 0.9192 - loss: 0.2339 - val_AUC: 0.9634 - val_accuracy: 0.9132 - val_loss: 0.2510
Epoch 91/200
56/56 - 0s - 2ms/step - AUC: 0.9656 - accuracy: 0.9174 - loss: 0.2364 - val_AUC: 0.9634 - val_accuracy: 0.9117 - val_loss: 0.2507
Epoch 92/200
56/56 - 0s - 2ms/step - AUC: 0.9658 - accuracy: 0.9207 - loss: 0.2349 - val_AUC: 0.9634 - val_accuracy: 0.9117 - val_loss: 0.2503
Epoch 93/200
56/56 - 0s - 2ms/step - AUC: 0.9676 - accuracy: 0.9145 - loss: 0.2285 - val_AUC: 0.9635 - val_accuracy: 0.9117 - val_loss: 0.2502
Epoch 94/200
56/56 - 0s - 2ms/step - AUC: 0.9672 - accuracy: 0.9207 - loss: 0.2289 - val_AUC: 0.9633 - val_accuracy: 0.9132 - val_loss: 0.2498
Epoch 95/200
56/56 - 0s - 2ms/step - AUC: 0.9694 - accuracy: 0.9221 - loss: 0.2266 - val_AUC: 0.9633 - val_accuracy: 0.9117 - val_loss: 0.2496
Epoch 96/200
56/56 - 0s - 2ms/step - AUC: 0.9658 - accuracy: 0.9243 - loss: 0.2330 - val_AUC: 0.9635 - val_accuracy: 0.9103 - val_loss: 0.2495
Epoch 97/200
56/56 - 0s - 2ms/step - AUC: 0.9660 - accuracy: 0.9163 - loss: 0.2337 - val_AUC: 0.9635 - val_accuracy: 0.9117 - val_loss: 0.2494
Epoch 98/200
56/56 - 0s - 2ms/step - AUC: 0.9669 - accuracy: 0.9170 - loss: 0.2314 - val_AUC: 0.9633 - val_accuracy: 0.9117 - val_loss: 0.2490
Epoch 99/200
56/56 - 0s - 2ms/step - AUC: 0.9653 - accuracy: 0.9159 - loss: 0.2382 - val_AUC: 0.9637 - val_accuracy: 0.9132 - val_loss: 0.2483
Epoch 100/200
56/56 - 0s - 2ms/step - AUC: 0.9670 - accuracy: 0.9178 - loss: 0.2307 - val_AUC: 0.9637 - val_accuracy: 0.9132 - val_loss: 0.2482
Epoch 101/200
56/56 - 0s - 2ms/step - AUC: 0.9673 - accuracy: 0.9199 - loss: 0.2278 - val_AUC: 0.9637 - val_accuracy: 0.9132 - val_loss: 0.2476
Epoch 102/200
56/56 - 0s - 2ms/step - AUC: 0.9664 - accuracy: 0.9196 - loss: 0.2329 - val_AUC: 0.9640 - val_accuracy: 0.9117 - val_loss: 0.2473
Epoch 103/200
56/56 - 0s - 2ms/step - AUC: 0.9682 - accuracy: 0.9178 - loss: 0.2252 - val_AUC: 0.9639 - val_accuracy: 0.9117 - val_loss: 0.2473
Epoch 104/200
56/56 - 0s - 2ms/step - AUC: 0.9661 - accuracy: 0.9210 - loss: 0.2306 - val_AUC: 0.9640 - val_accuracy: 0.9117 - val_loss: 0.2470
Epoch 105/200
56/56 - 0s - 2ms/step - AUC: 0.9646 - accuracy: 0.9192 - loss: 0.2350 - val_AUC: 0.9645 - val_accuracy: 0.9088 - val_loss: 0.2461
Epoch 106/200
56/56 - 0s - 2ms/step - AUC: 0.9678 - accuracy: 0.9141 - loss: 0.2295 - val_AUC: 0.9648 - val_accuracy: 0.9132 - val_loss: 0.2453
Epoch 107/200
56/56 - 0s - 2ms/step - AUC: 0.9678 - accuracy: 0.9203 - loss: 0.2313 - val_AUC: 0.9650 - val_accuracy: 0.9103 - val_loss: 0.2448
Epoch 108/200
56/56 - 0s - 2ms/step - AUC: 0.9688 - accuracy: 0.9228 - loss: 0.2282 - val_AUC: 0.9650 - val_accuracy: 0.9088 - val_loss: 0.2439
Epoch 109/200
56/56 - 0s - 2ms/step - AUC: 0.9675 - accuracy: 0.9174 - loss: 0.2282 - val_AUC: 0.9650 - val_accuracy: 0.9103 - val_loss: 0.2439
Epoch 110/200
56/56 - 0s - 2ms/step - AUC: 0.9667 - accuracy: 0.9178 - loss: 0.2324 - val_AUC: 0.9652 - val_accuracy: 0.9103 - val_loss: 0.2433
Epoch 111/200
56/56 - 0s - 2ms/step - AUC: 0.9676 - accuracy: 0.9236 - loss: 0.2258 - val_AUC: 0.9651 - val_accuracy: 0.9103 - val_loss: 0.2434
Epoch 112/200
56/56 - 0s - 2ms/step - AUC: 0.9682 - accuracy: 0.9196 - loss: 0.2252 - val_AUC: 0.9654 - val_accuracy: 0.9103 - val_loss: 0.2432
Epoch 113/200
56/56 - 0s - 2ms/step - AUC: 0.9682 - accuracy: 0.9225 - loss: 0.2271 - val_AUC: 0.9654 - val_accuracy: 0.9117 - val_loss: 0.2431
Epoch 114/200
56/56 - 0s - 2ms/step - AUC: 0.9671 - accuracy: 0.9207 - loss: 0.2352 - val_AUC: 0.9653 - val_accuracy: 0.9117 - val_loss: 0.2432
Epoch 115/200
56/56 - 0s - 2ms/step - AUC: 0.9658 - accuracy: 0.9163 - loss: 0.2343 - val_AUC: 0.9655 - val_accuracy: 0.9117 - val_loss: 0.2430
Epoch 116/200
56/56 - 0s - 2ms/step - AUC: 0.9682 - accuracy: 0.9236 - loss: 0.2289 - val_AUC: 0.9654 - val_accuracy: 0.9117 - val_loss: 0.2426
Epoch 117/200
56/56 - 0s - 2ms/step - AUC: 0.9661 - accuracy: 0.9196 - loss: 0.2319 - val_AUC: 0.9654 - val_accuracy: 0.9146 - val_loss: 0.2427
Epoch 118/200
56/56 - 0s - 2ms/step - AUC: 0.9673 - accuracy: 0.9163 - loss: 0.2311 - val_AUC: 0.9653 - val_accuracy: 0.9117 - val_loss: 0.2428
Epoch 119/200
56/56 - 0s - 2ms/step - AUC: 0.9663 - accuracy: 0.9181 - loss: 0.2326 - val_AUC: 0.9656 - val_accuracy: 0.9117 - val_loss: 0.2423
Epoch 120/200
56/56 - 0s - 2ms/step - AUC: 0.9672 - accuracy: 0.9217 - loss: 0.2288 - val_AUC: 0.9657 - val_accuracy: 0.9132 - val_loss: 0.2420
Epoch 121/200
56/56 - 0s - 2ms/step - AUC: 0.9665 - accuracy: 0.9167 - loss: 0.2313 - val_AUC: 0.9657 - val_accuracy: 0.9132 - val_loss: 0.2416
Epoch 122/200
56/56 - 0s - 2ms/step - AUC: 0.9687 - accuracy: 0.9199 - loss: 0.2277 - val_AUC: 0.9647 - val_accuracy: 0.9146 - val_loss: 0.2415
Epoch 123/200
56/56 - 0s - 2ms/step - AUC: 0.9690 - accuracy: 0.9243 - loss: 0.2232 - val_AUC: 0.9649 - val_accuracy: 0.9146 - val_loss: 0.2416
Epoch 124/200
56/56 - 0s - 2ms/step - AUC: 0.9670 - accuracy: 0.9199 - loss: 0.2320 - val_AUC: 0.9657 - val_accuracy: 0.9146 - val_loss: 0.2414
Epoch 125/200
56/56 - 0s - 2ms/step - AUC: 0.9702 - accuracy: 0.9170 - loss: 0.2206 - val_AUC: 0.9659 - val_accuracy: 0.9132 - val_loss: 0.2412
Epoch 126/200
56/56 - 0s - 2ms/step - AUC: 0.9651 - accuracy: 0.9192 - loss: 0.2345 - val_AUC: 0.9658 - val_accuracy: 0.9117 - val_loss: 0.2410
Epoch 127/200
56/56 - 0s - 2ms/step - AUC: 0.9661 - accuracy: 0.9217 - loss: 0.2334 - val_AUC: 0.9660 - val_accuracy: 0.9146 - val_loss: 0.2410
Epoch 128/200
56/56 - 0s - 2ms/step - AUC: 0.9709 - accuracy: 0.9257 - loss: 0.2173 - val_AUC: 0.9660 - val_accuracy: 0.9146 - val_loss: 0.2411
Epoch 129/200
56/56 - 0s - 2ms/step - AUC: 0.9680 - accuracy: 0.9181 - loss: 0.2280 - val_AUC: 0.9659 - val_accuracy: 0.9117 - val_loss: 0.2412
Epoch 130/200
56/56 - 0s - 2ms/step - AUC: 0.9694 - accuracy: 0.9203 - loss: 0.2247 - val_AUC: 0.9659 - val_accuracy: 0.9146 - val_loss: 0.2410
Epoch 131/200
56/56 - 0s - 2ms/step - AUC: 0.9668 - accuracy: 0.9221 - loss: 0.2314 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2405
Epoch 132/200
56/56 - 0s - 2ms/step - AUC: 0.9704 - accuracy: 0.9221 - loss: 0.2201 - val_AUC: 0.9663 - val_accuracy: 0.9146 - val_loss: 0.2401
Epoch 133/200
56/56 - 0s - 2ms/step - AUC: 0.9686 - accuracy: 0.9203 - loss: 0.2239 - val_AUC: 0.9663 - val_accuracy: 0.9161 - val_loss: 0.2399
Epoch 134/200
56/56 - 0s - 2ms/step - AUC: 0.9702 - accuracy: 0.9261 - loss: 0.2200 - val_AUC: 0.9664 - val_accuracy: 0.9146 - val_loss: 0.2402
Epoch 135/200
56/56 - 0s - 2ms/step - AUC: 0.9656 - accuracy: 0.9196 - loss: 0.2345 - val_AUC: 0.9665 - val_accuracy: 0.9146 - val_loss: 0.2401
Epoch 136/200
56/56 - 0s - 2ms/step - AUC: 0.9674 - accuracy: 0.9167 - loss: 0.2330 - val_AUC: 0.9664 - val_accuracy: 0.9146 - val_loss: 0.2397
Epoch 137/200
56/56 - 0s - 2ms/step - AUC: 0.9696 - accuracy: 0.9250 - loss: 0.2241 - val_AUC: 0.9664 - val_accuracy: 0.9146 - val_loss: 0.2397
Epoch 138/200
56/56 - 0s - 2ms/step - AUC: 0.9686 - accuracy: 0.9254 - loss: 0.2209 - val_AUC: 0.9664 - val_accuracy: 0.9161 - val_loss: 0.2396
Epoch 139/200
56/56 - 0s - 2ms/step - AUC: 0.9690 - accuracy: 0.9196 - loss: 0.2232 - val_AUC: 0.9663 - val_accuracy: 0.9146 - val_loss: 0.2397
Epoch 140/200
56/56 - 0s - 2ms/step - AUC: 0.9658 - accuracy: 0.9221 - loss: 0.2349 - val_AUC: 0.9662 - val_accuracy: 0.9146 - val_loss: 0.2395
Epoch 141/200
56/56 - 0s - 2ms/step - AUC: 0.9669 - accuracy: 0.9199 - loss: 0.2294 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2397
Epoch 142/200
56/56 - 0s - 2ms/step - AUC: 0.9691 - accuracy: 0.9217 - loss: 0.2224 - val_AUC: 0.9662 - val_accuracy: 0.9117 - val_loss: 0.2398
Epoch 143/200
56/56 - 0s - 2ms/step - AUC: 0.9672 - accuracy: 0.9214 - loss: 0.2300 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2400
Epoch 144/200
56/56 - 0s - 2ms/step - AUC: 0.9663 - accuracy: 0.9228 - loss: 0.2341 - val_AUC: 0.9660 - val_accuracy: 0.9146 - val_loss: 0.2396
Epoch 145/200
56/56 - 0s - 2ms/step - AUC: 0.9677 - accuracy: 0.9210 - loss: 0.2297 - val_AUC: 0.9659 - val_accuracy: 0.9146 - val_loss: 0.2396
Epoch 146/200
56/56 - 0s - 2ms/step - AUC: 0.9692 - accuracy: 0.9188 - loss: 0.2268 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2399
Epoch 147/200
56/56 - 0s - 2ms/step - AUC: 0.9675 - accuracy: 0.9192 - loss: 0.2271 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2401
Epoch 148/200
56/56 - 0s - 2ms/step - AUC: 0.9701 - accuracy: 0.9236 - loss: 0.2186 - val_AUC: 0.9660 - val_accuracy: 0.9146 - val_loss: 0.2402
Epoch 149/200
56/56 - 0s - 2ms/step - AUC: 0.9685 - accuracy: 0.9196 - loss: 0.2272 - val_AUC: 0.9662 - val_accuracy: 0.9146 - val_loss: 0.2399
Epoch 150/200
56/56 - 0s - 2ms/step - AUC: 0.9679 - accuracy: 0.9228 - loss: 0.2246 - val_AUC: 0.9661 - val_accuracy: 0.9146 - val_loss: 0.2403
plot(model_fit)

suppressMessages(library(pROC))
# Evaluating on training data
results_train_model_1 <- model_1 %>% evaluate(X_train, y_train)
108/108 - 0s - 933us/step - AUC: 0.9736 - accuracy: 0.9319 - loss: 0.2046
print(results_train_model_1)
$AUC
[1] 0.9735883

$accuracy
[1] 0.9319038

$loss
[1] 0.2046184
results_test_model_1 <- model_1 %>% evaluate(X_test, y_test)
36/36 - 0s - 1ms/step - AUC: 0.9709 - accuracy: 0.9339 - loss: 0.2058
print(results_test_model_1)
$AUC
[1] 0.9708707

$accuracy
[1] 0.9339131

$loss
[1] 0.2057929
# Prediction test data
y_prob_hat_test_1 <- model_1 %>% predict(X_test)
36/36 - 0s - 2ms/step
threshold <- 0.5 # Predict spam if probability > threshold
y_hat_test_1 <- as.factor(y_prob_hat_test > threshold)
levels(y_hat_test_1) <- c("not spam", "spam")
confusionMatrix(data = y_hat_test_1, test_spam, positive = "spam")
Confusion Matrix and Statistics

          Reference
Prediction not spam spam
  not spam      669   51
  spam           30  400
                                          
               Accuracy : 0.9296          
                 95% CI : (0.9132, 0.9437)
    No Information Rate : 0.6078          
    P-Value [Acc > NIR] : < 2e-16         
                                          
                  Kappa : 0.851           
                                          
 Mcnemar's Test P-Value : 0.02627         
                                          
            Sensitivity : 0.8869          
            Specificity : 0.9571          
         Pos Pred Value : 0.9302          
         Neg Pred Value : 0.9292          
             Prevalence : 0.3922          
         Detection Rate : 0.3478          
   Detection Prevalence : 0.3739          
      Balanced Accuracy : 0.9220          
                                          
       'Positive' Class : spam            
                                          
# ROC curve
par(pty="s")
roc_obj_1 <- roc(response = test_spam, predictor = as.vector(y_prob_hat_test_1), print.auc = TRUE, percent=TRUE)
Setting levels: control = not spam, case = spam
Setting direction: controls < cases
plot(roc_obj_1, legacy.axes = TRUE, percent=TRUE, col = "coral", main = "ROC spam email classifiers", print.auc=TRUE, print.auc.pattern = "AUC: %0.3f%%", auc.polygon=TRUE)

In terms of AUC and loss, the model with 2 hidden layers achieved better performance but the difference was not really significant. While the accuracy performance between 2 models were almost the same at around 93%. In this case, I think the problem is not complicated enough which allow deeper network (2 hidden layers) outperform the 1-hidden-layer model.

Problem 2. Deep learning for bike rental data (regression)

# Remove data
rm(list=ls()) # Remove variables
cat("\014") # Clean workspace
# Load data
suppressMessages(library(dplyr))
suppressMessages(library(splines))
bike_data <- read.csv('/Users/thangtm589/Desktop/UTS/37401 Machine Learning/Computer Lab/Lab 3/bike_rental_hourly.csv')

# Design data
bike_data$log_cnt <- log(bike_data$cnt)
bike_data$hour <- bike_data$hr/23 # transform [0, 23] to [0, 1]. 0 is midnight, 1 is 11 PM

# One hot for weathersit
one_hot_encode_weathersit <- model.matrix(~ as.factor(weathersit) - 1,data = bike_data)
one_hot_encode_weathersit  <- one_hot_encode_weathersit[, -1] # Remove reference category
colnames(one_hot_encode_weathersit) <- c('cloudy', 'light rain', 'heavy rain')
bike_data <- cbind(bike_data, one_hot_encode_weathersit)

# One hot for weekday
one_hot_encode_weekday <- model.matrix(~ as.factor(weekday) - 1,data = bike_data)
one_hot_encode_weekday  <- one_hot_encode_weekday[, -1] # Remove reference category
colnames(one_hot_encode_weekday) <- c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat')
bike_data <- cbind(bike_data, one_hot_encode_weekday)

# One hot for weekday
one_hot_encode_season <- model.matrix(~ as.factor(season) - 1,data = bike_data)
one_hot_encode_season  <- one_hot_encode_season[, -1] # Remove reference category
colnames(one_hot_encode_season) <- c('Spring', 'Summer', 'Fall')
bike_data <- cbind(bike_data, one_hot_encode_season)

# Create lags
bike_data_new <- mutate(bike_data, lag1 = lag(log_cnt, 1), lag2 = lag(log_cnt, 2),
                        lag3 = lag(log_cnt, 3), lag4 = lag(log_cnt, 4), lag24 = lag(log_cnt, 24))

bike_data_new <- bike_data_new[-c(1:24),] # Lost 24 obs because of lagging

# Create training and test data
bike_all_data_train <- bike_data_new[bike_data_new$dteday >= as.Date("2011-01-01") & bike_data_new$dteday <=  as.Date("2012-05-31"), ]
bike_all_data_test <- bike_data_new[bike_data_new$dteday >= as.Date("2012-06-01") & bike_data_new$dteday <=  as.Date("2012-12-31"), ]
X_train <- bike_all_data_train[, c("lag1", "lag2",  "lag3", "lag4", "lag24")]
spline_basis <- ns(bike_all_data_train$hour, df = 10, intercept = FALSE)
X_train <- cbind(X_train, spline_basis)
colnames(X_train)[1] <- "intercept"
knots <- attr(spline_basis, "knots")
variables_to_keep_in_X <- c("yr", "holiday", "workingday", "temp", "atemp", "hum", "windspeed")
variables_to_keep_in_X <- c(variables_to_keep_in_X, colnames(one_hot_encode_weathersit), colnames(one_hot_encode_weekday), colnames(one_hot_encode_season))
X_train <- cbind(X_train, bike_all_data_train[, variables_to_keep_in_X])

# Training data
X_train <- as.matrix(X_train)
y_train <- bike_all_data_train$log_cnt
# Test data
y_test <- bike_all_data_test$log_cnt
X_test <- bike_all_data_test[, c("lag1", "lag2",  "lag3", "lag4", "lag24")]
spline_basis_test <- ns(bike_all_data_test$hour, df=10, knots=knots, intercept = FALSE)
X_test <- cbind(X_test, spline_basis_test)
colnames(X_test)[1] <- "intercept"
X_test <- cbind(X_test, bike_all_data_test[, variables_to_keep_in_X])
X_test <- as.matrix(X_test)

💪 Problem 2.1

Fit a deep learning model with three hidden layers to the bike rental data. The number of units should be, for each level respectively, 16 (first hidden layer), 8, and 4 (last hidden level). Use ReLU activation functions in all layers. You are free to choose optimisation method and settings, and you may add regularisation via dropout and/or early stopping and/or penalty.

suppressMessages(library(tensorflow))
suppressMessages(library(keras3))
tensorflow::tf$random$set_seed(12345)

# Define the model 
model <- keras_model_sequential() 
model %>% 
  # Add first hidden layer
  layer_dense(units = 16, activation = 'relu', input_shape = c(34), kernel_regularizer = regularizer_l2(l = 0.01)) %>% 
  # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.2) %>% 
  # Add second hidden layer
  layer_dense(units = 8, activation = 'relu') %>%
  # Add regularisation via dropout to the second hidden layer
  layer_dropout(rate = 0.2) %>%
  # Add third hidden layer
  layer_dense(units = 4, activation = 'relu') %>%
  # Add regularisation via dropout to the third hidden layer
  layer_dropout(rate = 0.2) %>%
  # Add layer that connects to the observations
  layer_dense(units = 1, activation = 'linear')
summary(model)
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_5 (Dense)                   │ (None, 16)               │           560 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_3 (Dropout)               │ (None, 16)               │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_6 (Dense)                   │ (None, 8)                │           136 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_4 (Dropout)               │ (None, 8)                │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_7 (Dense)                   │ (None, 4)                │            36 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_5 (Dropout)               │ (None, 4)                │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_8 (Dense)                   │ (None, 1)                │             5 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 737 (2.88 KB)
 Trainable params: 737 (2.88 KB)
 Non-trainable params: 0 (0.00 B)
# Set early stopping 
early_stopping <- callback_early_stopping(monitor="val_loss", patience = 10, restore_best_weights = TRUE)

# Compile model
model %>% compile(loss = 'mse', optimizer = 'adam', metrics = NULL)

# Fit model
model_fit <- model %>% fit(X_train, y_train, epochs = 200, batch_size = 50, validation_split = 0.2, callbacks = list(early_stopping))
Epoch 1/200
197/197 - 2s - 9ms/step - loss: 11.6100 - val_loss: 3.6621
Epoch 2/200
197/197 - 0s - 1ms/step - loss: 4.7720 - val_loss: 3.5106
Epoch 3/200
197/197 - 0s - 1ms/step - loss: 4.0371 - val_loss: 3.4062
Epoch 4/200
197/197 - 0s - 1ms/step - loss: 3.6110 - val_loss: 3.3511
Epoch 5/200
197/197 - 0s - 1ms/step - loss: 3.3654 - val_loss: 2.9811
Epoch 6/200
197/197 - 0s - 1ms/step - loss: 3.1299 - val_loss: 2.9265
Epoch 7/200
197/197 - 0s - 1ms/step - loss: 2.9660 - val_loss: 2.8635
Epoch 8/200
197/197 - 0s - 1ms/step - loss: 2.7615 - val_loss: 2.5680
Epoch 9/200
197/197 - 0s - 2ms/step - loss: 2.5347 - val_loss: 2.1840
Epoch 10/200
197/197 - 0s - 2ms/step - loss: 2.2509 - val_loss: 2.0883
Epoch 11/200
197/197 - 0s - 1ms/step - loss: 2.1488 - val_loss: 1.5333
Epoch 12/200
197/197 - 0s - 1ms/step - loss: 2.0216 - val_loss: 1.2599
Epoch 13/200
197/197 - 0s - 1ms/step - loss: 1.8273 - val_loss: 1.5005
Epoch 14/200
197/197 - 0s - 1ms/step - loss: 1.6968 - val_loss: 1.1464
Epoch 15/200
197/197 - 0s - 1ms/step - loss: 1.5810 - val_loss: 0.9688
Epoch 16/200
197/197 - 0s - 1ms/step - loss: 1.5160 - val_loss: 1.0399
Epoch 17/200
197/197 - 0s - 1ms/step - loss: 1.4723 - val_loss: 0.9494
Epoch 18/200
197/197 - 0s - 1ms/step - loss: 1.3706 - val_loss: 0.9068
Epoch 19/200
197/197 - 0s - 1ms/step - loss: 1.2615 - val_loss: 0.8418
Epoch 20/200
197/197 - 0s - 1ms/step - loss: 1.2042 - val_loss: 0.8256
Epoch 21/200
197/197 - 0s - 2ms/step - loss: 1.1555 - val_loss: 0.7199
Epoch 22/200
197/197 - 0s - 1ms/step - loss: 1.0735 - val_loss: 0.8798
Epoch 23/200
197/197 - 0s - 1ms/step - loss: 1.0145 - val_loss: 0.8292
Epoch 24/200
197/197 - 0s - 1ms/step - loss: 0.9636 - val_loss: 0.6995
Epoch 25/200
197/197 - 0s - 1ms/step - loss: 0.9106 - val_loss: 0.6651
Epoch 26/200
197/197 - 0s - 1ms/step - loss: 0.8322 - val_loss: 0.6258
Epoch 27/200
197/197 - 0s - 1ms/step - loss: 0.8210 - val_loss: 0.5996
Epoch 28/200
197/197 - 0s - 1ms/step - loss: 0.7845 - val_loss: 0.6096
Epoch 29/200
197/197 - 0s - 1ms/step - loss: 0.7755 - val_loss: 0.5728
Epoch 30/200
197/197 - 0s - 1ms/step - loss: 0.7319 - val_loss: 0.5354
Epoch 31/200
197/197 - 0s - 1ms/step - loss: 0.6881 - val_loss: 0.5375
Epoch 32/200
197/197 - 0s - 1ms/step - loss: 0.6897 - val_loss: 0.5714
Epoch 33/200
197/197 - 0s - 1ms/step - loss: 0.6626 - val_loss: 0.5558
Epoch 34/200
197/197 - 0s - 1ms/step - loss: 0.6422 - val_loss: 0.5233
Epoch 35/200
197/197 - 0s - 1ms/step - loss: 0.6246 - val_loss: 0.5229
Epoch 36/200
197/197 - 0s - 1ms/step - loss: 0.6218 - val_loss: 0.4687
Epoch 37/200
197/197 - 0s - 1ms/step - loss: 0.6104 - val_loss: 0.5668
Epoch 38/200
197/197 - 0s - 1ms/step - loss: 0.6002 - val_loss: 0.5023
Epoch 39/200
197/197 - 0s - 1ms/step - loss: 0.5702 - val_loss: 0.4673
Epoch 40/200
197/197 - 0s - 1ms/step - loss: 0.5819 - val_loss: 0.5036
Epoch 41/200
197/197 - 0s - 1ms/step - loss: 0.5588 - val_loss: 0.5356
Epoch 42/200
197/197 - 0s - 1ms/step - loss: 0.5602 - val_loss: 0.4115
Epoch 43/200
197/197 - 0s - 1ms/step - loss: 0.5578 - val_loss: 0.4583
Epoch 44/200
197/197 - 0s - 1ms/step - loss: 0.5456 - val_loss: 0.4513
Epoch 45/200
197/197 - 0s - 1ms/step - loss: 0.5492 - val_loss: 0.4664
Epoch 46/200
197/197 - 0s - 1ms/step - loss: 0.5251 - val_loss: 0.4636
Epoch 47/200
197/197 - 0s - 1ms/step - loss: 0.5307 - val_loss: 0.4263
Epoch 48/200
197/197 - 0s - 1ms/step - loss: 0.5414 - val_loss: 0.4127
Epoch 49/200
197/197 - 0s - 1ms/step - loss: 0.5343 - val_loss: 0.4634
Epoch 50/200
197/197 - 0s - 1ms/step - loss: 0.5145 - val_loss: 0.4470
Epoch 51/200
197/197 - 0s - 1ms/step - loss: 0.5132 - val_loss: 0.4435
Epoch 52/200
197/197 - 0s - 1ms/step - loss: 0.5221 - val_loss: 0.4371
plot(model_fit)

💪 Problem 2.2

Compute the RMSEs for the training and test data.

# Prediction basing on training and test data
y_hat_train <- model %>% predict(X_train)
384/384 - 0s - 849us/step
y_hat_test <- model %>% predict(X_test)
160/160 - 0s - 719us/step
# Compute RMSEs
RMSE_training <- sqrt(sum((y_train - y_hat_train)^2)/length(y_train))
RMSE_test <- sqrt(sum((y_test - y_hat_test)^2)/length(y_test))

# Print RMSE
cat(paste0("RMSE Training: ", RMSE_training, "\n",
           "RMSE Test    : ", RMSE_test, "\n"
           ))
RMSE Training: 0.551224134918377
RMSE Test    : 0.591758625775932

💪 Problem 2.3

Plot a time series plot of the response in the original scale (i.e. counts and not log-counts) for the last week of the test data (last \(24\times 7\) observations). In the same figure, plot a time series plot of the fitted values (in the original scale) from Problem 2.1. Comment on the fit.

# Design time series data and choose the last week of the test data
row_to_keep <- c((nrow(bike_all_data_test)-167):nrow(bike_all_data_test))
time_series <- data.frame(bike_all_data_test$dteday, 
                          bike_all_data_test$hr, 
                          exp(y_test),                # Convert data to original scale
                          exp(y_hat_test))            # Convert data to original scale
time_series <- time_series[row_to_keep, ] # Keep last week of data
time_series$datetime <- as.POSIXct(paste(time_series[,1], time_series[,2]), format="%Y-%m-%d %H")

# Change column names
colnames(time_series) <- c("dteday", "hr", "y_test", "y_hat_test", "datetime")

# Plot time series data
suppressMessages(library(ggplot2))

ggplot(data = time_series, aes(x = datetime)) +
  geom_line(aes(y = y_test, colour = "Original"), lwd=1.2) +
  
  # Add line of predicted value 
  geom_line(aes(y = y_hat_test, colour = "Fitted"), lty=1) +
  
  scale_colour_manual("", 
                      breaks = c("Original", "Fitted"),
                      values = c("red", "green")) +
  xlab("Datetime") +
  ylab("Counts") + 
  theme(axis.text.x=element_text(angle=60, hjust=1))

Looking at the above figure, it indicates that the model didn’t perform well when it could not capture the broad trend of the true value, especially at some high-valued peaks on Dec 27 to Dec 29 or Dec 31 to Jan 01. This can be a sign of poor generalization performance.

💪 Problem 2.4

Propose a better deep learning model than that in Problem 2.1. Add the predictions of your new model to the figure you created in Problem 2.3.

After several experiments to fix the model, I decided to remove third hidden layers in order to reduce the complexity of the model, help it generalize better instead of memorizing data.

# Define the model 
model_upgrade <- keras_model_sequential() 
model_upgrade %>% 
  # Add first hidden layer
  layer_dense(units = 16, activation = 'relu', input_shape = c(34), kernel_regularizer = regularizer_l2(l = 0.01)) %>% 
    # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.2) %>% 
  # Add second hidden layer
  layer_dense(units = 8, activation = 'relu') %>%
    # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.2) %>% 
  # Add third hidden layer
  #layer_dense(units = 4, activation = 'relu') %>%
  # Add layer that connects to the observations
  layer_dense(units = 1, activation = 'linear')
summary(model_upgrade)
Model: "sequential_3"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_9 (Dense)                   │ (None, 16)               │           560 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_6 (Dropout)               │ (None, 16)               │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_10 (Dense)                  │ (None, 8)                │           136 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_7 (Dropout)               │ (None, 8)                │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_11 (Dense)                  │ (None, 1)                │             9 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 705 (2.75 KB)
 Trainable params: 705 (2.75 KB)
 Non-trainable params: 0 (0.00 B)
# Set early stopping 
early_stopping <- callback_early_stopping(monitor="val_loss", patience = 10, restore_best_weights = TRUE)

# Compile model
model_upgrade %>% compile(loss = 'mse', optimizer = 'adam', metrics = NULL)

# Fit model
model_fit_upgrade <- model_upgrade %>% fit(X_train, y_train, epochs = 200, batch_size = 50, validation_split = 0.2, callbacks = list(early_stopping))
Epoch 1/200
197/197 - 1s - 7ms/step - loss: 17.1186 - val_loss: 0.9911
Epoch 2/200
197/197 - 0s - 1ms/step - loss: 2.6198 - val_loss: 0.7865
Epoch 3/200
197/197 - 0s - 1ms/step - loss: 2.2576 - val_loss: 0.6202
Epoch 4/200
197/197 - 0s - 1ms/step - loss: 2.0249 - val_loss: 0.5169
Epoch 5/200
197/197 - 0s - 1ms/step - loss: 1.9423 - val_loss: 0.6435
Epoch 6/200
197/197 - 0s - 1ms/step - loss: 1.8633 - val_loss: 0.4769
Epoch 7/200
197/197 - 0s - 1ms/step - loss: 1.7292 - val_loss: 0.4781
Epoch 8/200
197/197 - 0s - 1ms/step - loss: 1.6176 - val_loss: 0.4683
Epoch 9/200
197/197 - 0s - 1ms/step - loss: 1.6485 - val_loss: 0.5003
Epoch 10/200
197/197 - 0s - 1ms/step - loss: 1.5630 - val_loss: 0.4286
Epoch 11/200
197/197 - 0s - 1ms/step - loss: 1.5058 - val_loss: 0.3956
Epoch 12/200
197/197 - 0s - 1ms/step - loss: 1.4058 - val_loss: 0.4511
Epoch 13/200
197/197 - 0s - 1ms/step - loss: 1.3544 - val_loss: 0.3392
Epoch 14/200
197/197 - 0s - 1ms/step - loss: 1.3030 - val_loss: 0.4145
Epoch 15/200
197/197 - 0s - 1ms/step - loss: 1.2670 - val_loss: 0.4345
Epoch 16/200
197/197 - 0s - 1ms/step - loss: 1.1885 - val_loss: 0.3780
Epoch 17/200
197/197 - 0s - 1ms/step - loss: 1.1583 - val_loss: 0.3236
Epoch 18/200
197/197 - 0s - 1ms/step - loss: 1.1049 - val_loss: 0.2717
Epoch 19/200
197/197 - 0s - 1ms/step - loss: 1.0422 - val_loss: 0.2868
Epoch 20/200
197/197 - 0s - 1ms/step - loss: 1.0218 - val_loss: 0.2635
Epoch 21/200
197/197 - 0s - 1ms/step - loss: 0.9439 - val_loss: 0.3018
Epoch 22/200
197/197 - 0s - 1ms/step - loss: 0.9187 - val_loss: 0.2815
Epoch 23/200
197/197 - 0s - 1ms/step - loss: 0.9121 - val_loss: 0.2586
Epoch 24/200
197/197 - 0s - 1ms/step - loss: 0.8490 - val_loss: 0.3024
Epoch 25/200
197/197 - 0s - 1ms/step - loss: 0.8237 - val_loss: 0.3138
Epoch 26/200
197/197 - 0s - 1ms/step - loss: 0.8078 - val_loss: 0.2439
Epoch 27/200
197/197 - 0s - 1ms/step - loss: 0.7353 - val_loss: 0.3246
Epoch 28/200
197/197 - 0s - 1ms/step - loss: 0.7291 - val_loss: 0.2643
Epoch 29/200
197/197 - 0s - 1ms/step - loss: 0.6910 - val_loss: 0.2772
Epoch 30/200
197/197 - 0s - 1ms/step - loss: 0.6672 - val_loss: 0.2599
Epoch 31/200
197/197 - 0s - 1ms/step - loss: 0.6243 - val_loss: 0.2388
Epoch 32/200
197/197 - 0s - 1ms/step - loss: 0.6114 - val_loss: 0.2341
Epoch 33/200
197/197 - 0s - 1ms/step - loss: 0.5849 - val_loss: 0.2648
Epoch 34/200
197/197 - 0s - 1ms/step - loss: 0.5736 - val_loss: 0.2181
Epoch 35/200
197/197 - 0s - 1ms/step - loss: 0.5300 - val_loss: 0.2281
Epoch 36/200
197/197 - 0s - 1ms/step - loss: 0.5321 - val_loss: 0.2649
Epoch 37/200
197/197 - 0s - 1ms/step - loss: 0.4961 - val_loss: 0.2346
Epoch 38/200
197/197 - 0s - 1ms/step - loss: 0.4845 - val_loss: 0.2430
Epoch 39/200
197/197 - 0s - 1ms/step - loss: 0.4767 - val_loss: 0.2031
Epoch 40/200
197/197 - 0s - 1ms/step - loss: 0.4591 - val_loss: 0.2078
Epoch 41/200
197/197 - 0s - 1ms/step - loss: 0.4566 - val_loss: 0.2286
Epoch 42/200
197/197 - 0s - 1ms/step - loss: 0.4430 - val_loss: 0.2219
Epoch 43/200
197/197 - 0s - 1ms/step - loss: 0.4400 - val_loss: 0.2203
Epoch 44/200
197/197 - 0s - 1ms/step - loss: 0.4230 - val_loss: 0.1931
Epoch 45/200
197/197 - 0s - 1ms/step - loss: 0.4086 - val_loss: 0.2030
Epoch 46/200
197/197 - 0s - 1ms/step - loss: 0.4090 - val_loss: 0.2088
Epoch 47/200
197/197 - 0s - 1ms/step - loss: 0.3939 - val_loss: 0.2238
Epoch 48/200
197/197 - 0s - 1ms/step - loss: 0.3844 - val_loss: 0.2121
Epoch 49/200
197/197 - 0s - 1ms/step - loss: 0.3796 - val_loss: 0.2174
Epoch 50/200
197/197 - 0s - 1ms/step - loss: 0.3770 - val_loss: 0.2165
Epoch 51/200
197/197 - 0s - 1ms/step - loss: 0.3770 - val_loss: 0.2152
Epoch 52/200
197/197 - 0s - 1ms/step - loss: 0.3559 - val_loss: 0.2382
Epoch 53/200
197/197 - 0s - 1ms/step - loss: 0.3694 - val_loss: 0.2056
Epoch 54/200
197/197 - 0s - 1ms/step - loss: 0.3744 - val_loss: 0.2119
plot(model_fit_upgrade)

# Predict data
y_hat_test_new <- model_upgrade %>% predict(X_test)
160/160 - 0s - 1ms/step
# Add to time series data
time_series$y_hat_test_new <- exp(y_hat_test_new[row_to_keep])

# Plot 
ggplot(data = time_series, aes(x = datetime)) +
  geom_line(aes(y = y_test, colour = "Original"), lwd=1.2) +
  
  # Add line of predicted value 
  geom_line(aes(y = y_hat_test, colour = "Fitted"), lty=1) +
  
  # Add line of predicted value 
  geom_line(aes(y = y_hat_test_new, colour = "(Upgraded) Fitted"), lty=1) +
  
  scale_colour_manual("", 
                      breaks = c("Original", "Fitted", "(Upgraded) Fitted"),
                      values = c("red", "green", "blue")) +
  xlab("Datetime") +
  ylab("Counts") + 
  theme(axis.text.x=element_text(angle=60, hjust=1))

Problem 3. Deep learning for classifying images

# Clean data
rm(list=ls()) # Remove variables
cat("\014") # Clean workspace
# Load libraries
suppressMessages(library(pracma)) # For image (matrix) rotation
suppressMessages(library(caret))
suppressMessages(library(tensorflow))
suppressMessages(library(keras3))
tensorflow::tf$random$set_seed(12345)

# Load and design data 
mnist <- dataset_mnist()
X_train_array <- mnist$train$x[1:10000, , ]
dim(X_train_array) # 10000x28x28 3D array with 10000 images (each 28-by-28 pixels)
[1] 10000    28    28
y_train_array <- mnist$train$y[1:10000]
length(y_train_array) # 10000 element vector with training labels (0-9)
[1] 10000
X_test_array <- mnist$test$x
y_test_array <- mnist$test$y

X_train <- array_reshape(X_train_array, c(nrow(X_train_array), 784)) # 10000x784 matrix
X_test <- array_reshape(X_test_array, c(nrow(X_test_array), 784))
# rescale to (0, 1)
X_train <- X_train / 255
X_test <- X_test / 255
# One-hot labels
y_train <- to_categorical(y_train_array, 10) # 10000x10 matrix, each row is one-hot (1 for the labelled class and the rest 0)
y_test <- to_categorical(y_test_array, 10)
print(y_train[1, ]) # Represent the label 5 (first element is the label 0)
 [1] 0 0 0 0 0 1 0 0 0 0
set.seed(12345)
tensorflow::tf$random$set_seed(12345)

# Construct model
model_MNIST_2layer <- keras_model_sequential()
model_MNIST_2layer %>%
  # Add first hidden layer
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.3) %>%
  # Add layer that connects to the observations
  layer_dense(units = 10, activation = 'softmax')
summary(model_MNIST_2layer)
Model: "sequential_4"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_12 (Dense)                  │ (None, 256)              │       200,960 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_8 (Dropout)               │ (None, 256)              │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_13 (Dense)                  │ (None, 128)              │        32,896 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_9 (Dropout)               │ (None, 128)              │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_14 (Dense)                  │ (None, 10)               │         1,290 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 235,146 (918.54 KB)
 Trainable params: 235,146 (918.54 KB)
 Non-trainable params: 0 (0.00 B)
# Compile model
model_MNIST_2layer %>% compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = c('accuracy'))

# Fit model
model_MNIST_2layer_fit <- model_MNIST_2layer %>% fit(X_train, y_train, epochs = 50, batch_size = 100, validation_split = 0.2)
Epoch 1/50
80/80 - 1s - 18ms/step - accuracy: 0.7119 - loss: 0.9145 - val_accuracy: 0.8945 - val_loss: 0.3728
Epoch 2/50
80/80 - 0s - 4ms/step - accuracy: 0.8905 - loss: 0.3618 - val_accuracy: 0.9130 - val_loss: 0.2975
Epoch 3/50
80/80 - 0s - 4ms/step - accuracy: 0.9233 - loss: 0.2671 - val_accuracy: 0.9255 - val_loss: 0.2601
Epoch 4/50
80/80 - 0s - 4ms/step - accuracy: 0.9308 - loss: 0.2203 - val_accuracy: 0.9315 - val_loss: 0.2371
Epoch 5/50
80/80 - 0s - 4ms/step - accuracy: 0.9499 - loss: 0.1750 - val_accuracy: 0.9360 - val_loss: 0.2215
Epoch 6/50
80/80 - 0s - 4ms/step - accuracy: 0.9548 - loss: 0.1518 - val_accuracy: 0.9400 - val_loss: 0.2076
Epoch 7/50
80/80 - 0s - 4ms/step - accuracy: 0.9582 - loss: 0.1344 - val_accuracy: 0.9370 - val_loss: 0.2062
Epoch 8/50
80/80 - 0s - 4ms/step - accuracy: 0.9671 - loss: 0.1126 - val_accuracy: 0.9420 - val_loss: 0.2021
Epoch 9/50
80/80 - 0s - 4ms/step - accuracy: 0.9735 - loss: 0.0948 - val_accuracy: 0.9450 - val_loss: 0.1918
Epoch 10/50
80/80 - 0s - 4ms/step - accuracy: 0.9749 - loss: 0.0801 - val_accuracy: 0.9470 - val_loss: 0.1804
Epoch 11/50
80/80 - 0s - 4ms/step - accuracy: 0.9775 - loss: 0.0752 - val_accuracy: 0.9450 - val_loss: 0.1915
Epoch 12/50
80/80 - 0s - 4ms/step - accuracy: 0.9827 - loss: 0.0590 - val_accuracy: 0.9475 - val_loss: 0.1989
Epoch 13/50
80/80 - 0s - 4ms/step - accuracy: 0.9831 - loss: 0.0577 - val_accuracy: 0.9520 - val_loss: 0.1879
Epoch 14/50
80/80 - 0s - 4ms/step - accuracy: 0.9835 - loss: 0.0541 - val_accuracy: 0.9485 - val_loss: 0.1866
Epoch 15/50
80/80 - 0s - 4ms/step - accuracy: 0.9833 - loss: 0.0485 - val_accuracy: 0.9525 - val_loss: 0.1690
Epoch 16/50
80/80 - 0s - 4ms/step - accuracy: 0.9876 - loss: 0.0411 - val_accuracy: 0.9500 - val_loss: 0.1918
Epoch 17/50
80/80 - 0s - 4ms/step - accuracy: 0.9880 - loss: 0.0380 - val_accuracy: 0.9515 - val_loss: 0.1868
Epoch 18/50
80/80 - 0s - 4ms/step - accuracy: 0.9896 - loss: 0.0348 - val_accuracy: 0.9460 - val_loss: 0.1973
Epoch 19/50
80/80 - 0s - 4ms/step - accuracy: 0.9901 - loss: 0.0331 - val_accuracy: 0.9520 - val_loss: 0.1992
Epoch 20/50
80/80 - 0s - 4ms/step - accuracy: 0.9900 - loss: 0.0304 - val_accuracy: 0.9545 - val_loss: 0.1958
Epoch 21/50
80/80 - 0s - 4ms/step - accuracy: 0.9915 - loss: 0.0304 - val_accuracy: 0.9520 - val_loss: 0.1931
Epoch 22/50
80/80 - 0s - 4ms/step - accuracy: 0.9909 - loss: 0.0271 - val_accuracy: 0.9480 - val_loss: 0.1997
Epoch 23/50
80/80 - 0s - 4ms/step - accuracy: 0.9906 - loss: 0.0253 - val_accuracy: 0.9530 - val_loss: 0.1848
Epoch 24/50
80/80 - 0s - 4ms/step - accuracy: 0.9954 - loss: 0.0207 - val_accuracy: 0.9510 - val_loss: 0.2139
Epoch 25/50
80/80 - 0s - 4ms/step - accuracy: 0.9948 - loss: 0.0199 - val_accuracy: 0.9495 - val_loss: 0.2221
Epoch 26/50
80/80 - 0s - 4ms/step - accuracy: 0.9925 - loss: 0.0235 - val_accuracy: 0.9520 - val_loss: 0.2044
Epoch 27/50
80/80 - 0s - 4ms/step - accuracy: 0.9946 - loss: 0.0186 - val_accuracy: 0.9465 - val_loss: 0.2255
Epoch 28/50
80/80 - 0s - 4ms/step - accuracy: 0.9920 - loss: 0.0227 - val_accuracy: 0.9510 - val_loss: 0.1989
Epoch 29/50
80/80 - 0s - 4ms/step - accuracy: 0.9946 - loss: 0.0179 - val_accuracy: 0.9500 - val_loss: 0.2351
Epoch 30/50
80/80 - 0s - 4ms/step - accuracy: 0.9937 - loss: 0.0175 - val_accuracy: 0.9515 - val_loss: 0.2205
Epoch 31/50
80/80 - 0s - 4ms/step - accuracy: 0.9933 - loss: 0.0231 - val_accuracy: 0.9495 - val_loss: 0.2144
Epoch 32/50
80/80 - 0s - 4ms/step - accuracy: 0.9945 - loss: 0.0167 - val_accuracy: 0.9530 - val_loss: 0.2063
Epoch 33/50
80/80 - 0s - 4ms/step - accuracy: 0.9946 - loss: 0.0152 - val_accuracy: 0.9485 - val_loss: 0.2338
Epoch 34/50
80/80 - 0s - 4ms/step - accuracy: 0.9954 - loss: 0.0171 - val_accuracy: 0.9505 - val_loss: 0.2249
Epoch 35/50
80/80 - 0s - 4ms/step - accuracy: 0.9948 - loss: 0.0145 - val_accuracy: 0.9450 - val_loss: 0.2506
Epoch 36/50
80/80 - 0s - 4ms/step - accuracy: 0.9931 - loss: 0.0187 - val_accuracy: 0.9505 - val_loss: 0.2299
Epoch 37/50
80/80 - 0s - 4ms/step - accuracy: 0.9937 - loss: 0.0166 - val_accuracy: 0.9510 - val_loss: 0.2417
Epoch 38/50
80/80 - 0s - 4ms/step - accuracy: 0.9956 - loss: 0.0154 - val_accuracy: 0.9550 - val_loss: 0.2101
Epoch 39/50
80/80 - 0s - 4ms/step - accuracy: 0.9961 - loss: 0.0131 - val_accuracy: 0.9560 - val_loss: 0.2257
Epoch 40/50
80/80 - 0s - 4ms/step - accuracy: 0.9965 - loss: 0.0111 - val_accuracy: 0.9560 - val_loss: 0.2296
Epoch 41/50
80/80 - 0s - 4ms/step - accuracy: 0.9958 - loss: 0.0124 - val_accuracy: 0.9565 - val_loss: 0.2366
Epoch 42/50
80/80 - 0s - 4ms/step - accuracy: 0.9969 - loss: 0.0105 - val_accuracy: 0.9540 - val_loss: 0.2409
Epoch 43/50
80/80 - 0s - 4ms/step - accuracy: 0.9955 - loss: 0.0135 - val_accuracy: 0.9545 - val_loss: 0.2234
Epoch 44/50
80/80 - 0s - 4ms/step - accuracy: 0.9933 - loss: 0.0202 - val_accuracy: 0.9470 - val_loss: 0.2538
Epoch 45/50
80/80 - 0s - 4ms/step - accuracy: 0.9956 - loss: 0.0132 - val_accuracy: 0.9560 - val_loss: 0.2314
Epoch 46/50
80/80 - 0s - 4ms/step - accuracy: 0.9965 - loss: 0.0126 - val_accuracy: 0.9530 - val_loss: 0.2271
Epoch 47/50
80/80 - 0s - 4ms/step - accuracy: 0.9969 - loss: 0.0092 - val_accuracy: 0.9535 - val_loss: 0.2338
Epoch 48/50
80/80 - 0s - 4ms/step - accuracy: 0.9971 - loss: 0.0100 - val_accuracy: 0.9510 - val_loss: 0.2927
Epoch 49/50
80/80 - 0s - 4ms/step - accuracy: 0.9959 - loss: 0.0126 - val_accuracy: 0.9520 - val_loss: 0.2892
Epoch 50/50
80/80 - 0s - 4ms/step - accuracy: 0.9965 - loss: 0.0091 - val_accuracy: 0.9560 - val_loss: 0.2781
plot(model_MNIST_2layer_fit)

# Predict
y_pred_test_dl_2layer <- model_MNIST_2layer %>% predict(X_test)
313/313 - 0s - 1ms/step
# Get example
indices <- which(rowSums(y_pred_test_dl_2layer <= 0.55 & y_pred_test_dl_2layer >= 0.45) == 2) #  Gets observations for which the classifier is unsure (have two cells in the interval (0.45, 0.55).
ind <- indices[1] # Taking the first
barplot(names.arg = 0:9, y_pred_test_dl_2layer[ind, ], col = "cornflowerblue", ylim = c(0, 1), main = paste("Predicted probs of test image ", ind, sep = ""))

cat("Actual label: ", which.max(y_test[ind, ]) - 1, ", Predicted label:", which.max(y_pred_test_dl_2layer[ind, ]) - 1, sep = "")
Actual label: 9, Predicted label:9

💪 Problem 3.1

Is the model over- or underfitting the data? Explain. Given the same model structure, propose a fix to the issue and implement it.

The figure Accuracy and Loss indicates that the gap of loss between training data and validation data are being widened after the 15th epoch, and the validation loss tend to increase. This mean the model is overfitting.

Therefore, we can apply L1 regularisation, set early stopping condition with patience=15 and even increase the dropout_rate a bit to avoid this overfitting situation, as follows:

model_MNIST_2layer_1 <- keras_model_sequential()
model_MNIST_2layer_1 %>%
  # Add first hidden layer
  layer_dense(units = 256, activation = 'relu', input_shape = c(784), kernel_regularizer = regularizer_l1(l = 0.01)) %>%
  # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.4) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  # Add regularisation via dropout to the first hidden layer
  layer_dropout(rate = 0.4) %>%
  # Add layer that connects to the observations
  layer_dense(units = 10, activation = 'softmax')
summary(model_MNIST_2layer_1)
Model: "sequential_5"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_15 (Dense)                  │ (None, 256)              │       200,960 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_10 (Dropout)              │ (None, 256)              │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_16 (Dense)                  │ (None, 128)              │        32,896 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dropout_11 (Dropout)              │ (None, 128)              │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_17 (Dense)                  │ (None, 10)               │         1,290 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 235,146 (918.54 KB)
 Trainable params: 235,146 (918.54 KB)
 Non-trainable params: 0 (0.00 B)
# Set early stopping
early_stopping <- callback_early_stopping(monitor="val_loss", patience = 15, restore_best_weights = TRUE)

# Compile model
model_MNIST_2layer_1 %>% compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = c('accuracy'))

# Fit model
model_MNIST_2layer_fit_fixed <- model_MNIST_2layer %>% fit(X_train, y_train, epochs = 50, batch_size = 100, validation_split = 0.2, callbacks = list(early_stopping))
Epoch 1/50
80/80 - 0s - 5ms/step - accuracy: 0.9975 - loss: 0.0096 - val_accuracy: 0.9515 - val_loss: 0.2674
Epoch 2/50
80/80 - 0s - 4ms/step - accuracy: 0.9971 - loss: 0.0080 - val_accuracy: 0.9525 - val_loss: 0.2608
Epoch 3/50
80/80 - 0s - 4ms/step - accuracy: 0.9956 - loss: 0.0124 - val_accuracy: 0.9530 - val_loss: 0.2769
Epoch 4/50
80/80 - 0s - 4ms/step - accuracy: 0.9969 - loss: 0.0089 - val_accuracy: 0.9535 - val_loss: 0.2734
Epoch 5/50
80/80 - 0s - 4ms/step - accuracy: 0.9962 - loss: 0.0107 - val_accuracy: 0.9555 - val_loss: 0.2367
Epoch 6/50
80/80 - 0s - 4ms/step - accuracy: 0.9950 - loss: 0.0144 - val_accuracy: 0.9485 - val_loss: 0.2989
Epoch 7/50
80/80 - 0s - 4ms/step - accuracy: 0.9939 - loss: 0.0195 - val_accuracy: 0.9525 - val_loss: 0.2491
Epoch 8/50
80/80 - 0s - 4ms/step - accuracy: 0.9950 - loss: 0.0134 - val_accuracy: 0.9500 - val_loss: 0.2775
Epoch 9/50
80/80 - 0s - 4ms/step - accuracy: 0.9967 - loss: 0.0105 - val_accuracy: 0.9515 - val_loss: 0.2801
Epoch 10/50
80/80 - 0s - 4ms/step - accuracy: 0.9965 - loss: 0.0106 - val_accuracy: 0.9535 - val_loss: 0.2773
Epoch 11/50
80/80 - 0s - 4ms/step - accuracy: 0.9976 - loss: 0.0076 - val_accuracy: 0.9510 - val_loss: 0.2836
Epoch 12/50
80/80 - 0s - 4ms/step - accuracy: 0.9955 - loss: 0.0131 - val_accuracy: 0.9480 - val_loss: 0.2657
Epoch 13/50
80/80 - 0s - 4ms/step - accuracy: 0.9967 - loss: 0.0089 - val_accuracy: 0.9515 - val_loss: 0.2944
Epoch 14/50
80/80 - 0s - 4ms/step - accuracy: 0.9965 - loss: 0.0100 - val_accuracy: 0.9475 - val_loss: 0.2994
Epoch 15/50
80/80 - 0s - 4ms/step - accuracy: 0.9944 - loss: 0.0151 - val_accuracy: 0.9480 - val_loss: 0.3058
Epoch 16/50
80/80 - 0s - 4ms/step - accuracy: 0.9934 - loss: 0.0197 - val_accuracy: 0.9500 - val_loss: 0.2508
Epoch 17/50
80/80 - 0s - 4ms/step - accuracy: 0.9973 - loss: 0.0106 - val_accuracy: 0.9555 - val_loss: 0.2603
Epoch 18/50
80/80 - 0s - 4ms/step - accuracy: 0.9973 - loss: 0.0075 - val_accuracy: 0.9550 - val_loss: 0.2912
Epoch 19/50
80/80 - 0s - 4ms/step - accuracy: 0.9969 - loss: 0.0099 - val_accuracy: 0.9525 - val_loss: 0.2699
Epoch 20/50
80/80 - 0s - 4ms/step - accuracy: 0.9969 - loss: 0.0075 - val_accuracy: 0.9540 - val_loss: 0.2671
plot(model_MNIST_2layer_fit_fixed)

As the above new Accuracy and Loss figure, we can see that after approaching a stable loss, the model proactively stop to avoid overfitting. Moreover, this model also reduced the computation time with less epoch used thanks to early stopping regularization.

💪 Problem 3.2

Compare the deep learning model with convolutional layers to that in Problem 3.1. Discuss the results.

X_train <- array(X_train_array, c(10000, 28, 28, 1)) # The last dimension is the channel
X_test <- array(X_test_array, c(10000, 28, 28, 1))
# Transform values into [0,1] range
X_train <- X_train / 255
X_test <- X_test / 255
# One-hot labels
y_train <- to_categorical(y_train_array, 10) # 10000x10 matrix, each row is one-hot (1 for the labelled class and the rest 0)
y_test <- to_categorical(y_test_array, 10)

# Define model
model_MNIST_2conv1layer <- keras_model_sequential() %>%
  # First convolutional layer
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
                input_shape = c(28, 28, 1)) %>%
  # Second convolutional layer
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
  # Add a pooling layer after the second convolutional layer
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  # Add regularisation via dropout to the second convolutional layer
  layer_dropout(rate = 0.4) %>%
  # Flatten the output of the preceeding layer
  layer_flatten() %>%
  # A third layer fully connected (input has been flattened)
  layer_dense(units = 128, activation = 'relu') %>%
  # Add regularisation via dropout to preceeding layer
  layer_dropout(rate = 0.4) %>%
  # Add layer that connects to the observations
  layer_dense(units = 10, activation = 'softmax')

# Compile model
model_MNIST_2conv1layer %>% compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = c('accuracy'))
model_MNIST_2conv1layer_fit <- model_MNIST_2conv1layer %>% fit(X_train, y_train, batch_size = 50, epochs = 15, validation_split = 0.2)
Epoch 1/15
160/160 - 13s - 79ms/step - accuracy: 0.8200 - loss: 0.5726 - val_accuracy: 0.9410 - val_loss: 0.2065
Epoch 2/15
160/160 - 11s - 69ms/step - accuracy: 0.9488 - loss: 0.1760 - val_accuracy: 0.9590 - val_loss: 0.1342
Epoch 3/15
160/160 - 11s - 69ms/step - accuracy: 0.9638 - loss: 0.1163 - val_accuracy: 0.9680 - val_loss: 0.1189
Epoch 4/15
160/160 - 11s - 69ms/step - accuracy: 0.9725 - loss: 0.0870 - val_accuracy: 0.9725 - val_loss: 0.0952
Epoch 5/15
160/160 - 11s - 69ms/step - accuracy: 0.9804 - loss: 0.0614 - val_accuracy: 0.9760 - val_loss: 0.0971
Epoch 6/15
160/160 - 11s - 69ms/step - accuracy: 0.9811 - loss: 0.0557 - val_accuracy: 0.9745 - val_loss: 0.0993
Epoch 7/15
160/160 - 11s - 68ms/step - accuracy: 0.9858 - loss: 0.0489 - val_accuracy: 0.9750 - val_loss: 0.0914
Epoch 8/15
160/160 - 11s - 68ms/step - accuracy: 0.9874 - loss: 0.0377 - val_accuracy: 0.9800 - val_loss: 0.0890
Epoch 9/15
160/160 - 11s - 69ms/step - accuracy: 0.9876 - loss: 0.0329 - val_accuracy: 0.9785 - val_loss: 0.0824
Epoch 10/15
160/160 - 11s - 69ms/step - accuracy: 0.9912 - loss: 0.0297 - val_accuracy: 0.9810 - val_loss: 0.0792
Epoch 11/15
160/160 - 11s - 69ms/step - accuracy: 0.9896 - loss: 0.0326 - val_accuracy: 0.9720 - val_loss: 0.1026
Epoch 12/15
160/160 - 12s - 74ms/step - accuracy: 0.9919 - loss: 0.0261 - val_accuracy: 0.9770 - val_loss: 0.0820
Epoch 13/15
160/160 - 12s - 74ms/step - accuracy: 0.9925 - loss: 0.0242 - val_accuracy: 0.9755 - val_loss: 0.1020
Epoch 14/15
160/160 - 11s - 70ms/step - accuracy: 0.9925 - loss: 0.0219 - val_accuracy: 0.9820 - val_loss: 0.0893
Epoch 15/15
160/160 - 11s - 69ms/step - accuracy: 0.9901 - loss: 0.0290 - val_accuracy: 0.9785 - val_loss: 0.0897
plot(model_MNIST_2conv1layer_fit)

# Predict 
y_hat_test <- model_MNIST_2conv1layer %>% predict(X_test)
313/313 - 3s - 8ms/step

Looking at both loss and accuracy numbers of validation data, we can see that performance of the model using convolutional layers are more outstanding. While previous model in Problem 3.1 just reach around 96% for validation accuracy and 0.3 for validation loss, this model outperform by achieving around 97.5% and 0.1 respectively. This is because convolutional layers prevent input from affecting all output nodes, it is designed to preserve the spatial relationships between pixels in the image, which helps generalize better. However, this new model requires more effort of computation as a result of more complicated model.

💪 Problem 3.3

Just before Problem 3.1, we inspected a special case where the classifier (based on the dense layers) was very uncertain. Compute the predicted class probabilities of that particular case with the new model (based on the convolutional filters) and compare to the previous result.

# Get the y value of previous model to compare
barplot(names.arg = 0:9, y_hat_test[ind, ], col = "cornflowerblue", ylim = c(0, 1), main = paste("Predicted probs of test image ", ind, sep = ""))

cat("Actual label: ", which.max(y_test[ind, ]) - 1, ", Predicted label:", which.max(y_hat_test[ind, ]) - 1, sep = "")
Actual label: 9, Predicted label:9

In terms of this example (maybe different due to various seed), the result from this model with convolutional layers is already improved compared to previous one when it reduced the uncertainty and correctly predicted. The correct predicted class probability is already significantly higher. This is also aligned with the better statistics in term of Accuracy and Loss in Problem 3.2. The advantage of convolutional layers is utilizing the spatial relationship which worked well in our case regarding images.

💪 Problem 3.4

Fit a neural network with at least two hidden convolutional layers to the data above. You are free to choose the settings, such as regularisation (dropout and/or early stopping and/or penalty), validation split, optimiser, etc.

suppressMessages(library(grid))
cifar10 <- dataset_cifar10()
class_names <- c('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
X_train_array <- cifar10$train$x[1:10000, , , ] # 10000x32x32x3 matrix
y_train_array <- cifar10$train$y[1:10000]
y_train_labels <- class_names[y_train_array+1]
X_test_array <- cifar10$test$x # 10000x32x32x3 matrix
y_test_array <- cifar10$test$y
y_test_labels <- class_names[y_test_array+1]
# rescale to (0, 1)
X_train <- X_train_array / 255
X_test <- X_test_array / 255
# One-hot labels
y_train <- to_categorical(y_train_array, 10) # 50000x10 matrix, each row is one-hot (1 for the
y_test <- to_categorical(y_test_array, 10)

# Plot a dog
obs_to_plot <- which(y_train_labels == "dog")[1] # first dog that appears
# Plot image obs_to_plot (in RGB color) in the training set. First get the rgb
image_nbr <- obs_to_plot
rgb_image <- rgb(X_train[image_nbr, , ,1], X_train[image_nbr, , ,2], X_train[image_nbr, , ,3])
dim(rgb_image) <- dim(X_train[image_nbr, , ,1])
grid.newpage()
grid.raster(rgb_image, interpolate=FALSE)

Not expecting high accuracy, I actively reduced the number of filters, patience, set low number of epoch (20) and added early stopping condition in order to reduce the computation effort.

# Define model
model_rgb_2conv1layer <- keras_model_sequential() %>%
  # First convolutional layer
  layer_conv_2d(filters = 16, kernel_size = c(3,3), activation = 'relu',
                input_shape = c(32, 32, 3)) %>%
  # Second convolutional layer
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu') %>%
  # Add a pooling layer after the second convolutional layer
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  # Add regularisation via dropout to the second convolutional layer
  layer_dropout(rate = 0.4) %>%
  # Flatten the output of the preceeding layer
  layer_flatten() %>%
  # A third layer fully connected (input has been flattened)
  layer_dense(units = 64, activation = 'relu') %>%
  # Add regularisation via dropout to preceeding layer
  layer_dropout(rate = 0.4) %>%
  # Add layer that connects to the observations
  layer_dense(units = 10, activation = 'softmax')

# Set early stopping
early_stopping <- callback_early_stopping(monitor="val_loss", patience = 5, restore_best_weights = TRUE)

# Compile model
model_rgb_2conv1layer %>% compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = c('accuracy'))
model_rgb_2conv1layer_fit <- model_rgb_2conv1layer %>% fit(X_train, y_train, batch_size = 100, epochs = 20, validation_split = 0.2, callbacks = list(early_stopping))
Epoch 1/20
80/80 - 7s - 90ms/step - accuracy: 0.1915 - loss: 2.1641 - val_accuracy: 0.3090 - val_loss: 1.9054
Epoch 2/20
80/80 - 6s - 71ms/step - accuracy: 0.3025 - loss: 1.8880 - val_accuracy: 0.4125 - val_loss: 1.6740
Epoch 3/20
80/80 - 6s - 72ms/step - accuracy: 0.3638 - loss: 1.7209 - val_accuracy: 0.4540 - val_loss: 1.5443
Epoch 4/20
80/80 - 6s - 71ms/step - accuracy: 0.4075 - loss: 1.6343 - val_accuracy: 0.4740 - val_loss: 1.4717
Epoch 5/20
80/80 - 6s - 71ms/step - accuracy: 0.4365 - loss: 1.5510 - val_accuracy: 0.4935 - val_loss: 1.4313
Epoch 6/20
80/80 - 6s - 71ms/step - accuracy: 0.4552 - loss: 1.4944 - val_accuracy: 0.5030 - val_loss: 1.3676
Epoch 7/20
80/80 - 6s - 70ms/step - accuracy: 0.4671 - loss: 1.4488 - val_accuracy: 0.5135 - val_loss: 1.3349
Epoch 8/20
80/80 - 6s - 71ms/step - accuracy: 0.4955 - loss: 1.3909 - val_accuracy: 0.5065 - val_loss: 1.3446
Epoch 9/20
80/80 - 6s - 71ms/step - accuracy: 0.4934 - loss: 1.3668 - val_accuracy: 0.5155 - val_loss: 1.3124
Epoch 10/20
80/80 - 6s - 70ms/step - accuracy: 0.5115 - loss: 1.3393 - val_accuracy: 0.5330 - val_loss: 1.2777
Epoch 11/20
80/80 - 6s - 69ms/step - accuracy: 0.5310 - loss: 1.2950 - val_accuracy: 0.5130 - val_loss: 1.3117
Epoch 12/20
80/80 - 6s - 74ms/step - accuracy: 0.5378 - loss: 1.2632 - val_accuracy: 0.5240 - val_loss: 1.2745
Epoch 13/20
80/80 - 6s - 74ms/step - accuracy: 0.5541 - loss: 1.2273 - val_accuracy: 0.5450 - val_loss: 1.2503
Epoch 14/20
80/80 - 6s - 72ms/step - accuracy: 0.5476 - loss: 1.2179 - val_accuracy: 0.5490 - val_loss: 1.2398
Epoch 15/20
80/80 - 6s - 71ms/step - accuracy: 0.5666 - loss: 1.1819 - val_accuracy: 0.5590 - val_loss: 1.2249
Epoch 16/20
80/80 - 5s - 67ms/step - accuracy: 0.5699 - loss: 1.1597 - val_accuracy: 0.5390 - val_loss: 1.2269
Epoch 17/20
80/80 - 6s - 71ms/step - accuracy: 0.5809 - loss: 1.1361 - val_accuracy: 0.5485 - val_loss: 1.2295
Epoch 18/20
80/80 - 6s - 72ms/step - accuracy: 0.5854 - loss: 1.1225 - val_accuracy: 0.5500 - val_loss: 1.2388
Epoch 19/20
80/80 - 6s - 72ms/step - accuracy: 0.5928 - loss: 1.1008 - val_accuracy: 0.5610 - val_loss: 1.2130
Epoch 20/20
80/80 - 6s - 72ms/step - accuracy: 0.5903 - loss: 1.0908 - val_accuracy: 0.5525 - val_loss: 1.2305
plot(model_rgb_2conv1layer_fit)

💪 Problem 3.5

Compute the confusion matrix for the test data. Out of the images that are horses, which two predicted classes are the most common when the classifier is wrong?

# Predict data 
y_hat_test <- model_rgb_2conv1layer %>% predict(X_test)
313/313 - 2s - 6ms/step
# Transform data
label_y_test <- class_names[apply(y_test, 1, which.max)]
label_y_hat_test <- class_names[apply(y_hat_test, 1, which.max)]

# Compute the confusion matrix for the test data
confusion_matrix <- table(label_y_test, label_y_hat_test, dnn = c("Actual", "Prediction"))
print(confusion_matrix)
       Prediction
Actual  bird car cat deer dog frog horse plane ship truck
  bird   324   6  70  212  81   86    75    96   34    16
  car      7 674   4   10   2   10    19    55   57   162
  cat     68  18 338  106 175   99    99    29   29    39
  deer   120   9  46  493  27   95   142    50    7    11
  dog     80   9 179   92 408   47   129    18   20    18
  frog    40  12  73   96  13  694    31     8   10    23
  horse   24   6  39   81  50   12   730    22    5    31
  plane   48  27  22   30   4   11    15   671  122    50
  ship     8  64  16    5   4    7    11   140  699    46
  truck    7 123   8    8   3   17    43    63   52   676
# Get the answer
horse_row <- confusion_matrix["horse", ]
horse_row_without_horse <- horse_row[names(horse_row) != "horse"]
sorted_horse_row <- sort(horse_row_without_horse, decreasing = TRUE)

# Print the answer
cat(paste0("Out of the images that are horses, two most common predicted classes when the classifier is wrong are:", "\n",
           names(sorted_horse_row[1]),": ", sorted_horse_row[1], " times\n",
           names(sorted_horse_row[2]),": ", sorted_horse_row[2], " times\n"
           ))
Out of the images that are horses, two most common predicted classes when the classifier is wrong are:
deer: 81 times
dog: 50 times

💪 Problem 3.6

Find an image in the test data set that the classifier is uncertain about (i.e. no class has close to probability 1). Plot the image and, moreover, plot the predictive distribution of that image (a bar plot with the probability for each of the classes). Did your classifier end up taking the right decision?

# Find the y hat value that exists the uncertainty
indices <- which(rowSums(y_hat_test <= 0.55 & y_hat_test >= 0.45) == 2)

# Get the first image 
ind <- indices[1]

# Plot the first image that has the uncertainty
rgb_image <- rgb(X_train[ind, , ,1], X_train[ind, , ,2], X_train[ind, , ,3])
dim(rgb_image) <- dim(X_train[image_nbr, , ,1])
grid.newpage()
grid.raster(rgb_image, interpolate=FALSE)

# Plot the predictive distribution 
barplot(names.arg = class_names, y_hat_test[ind, ], col = "cornflowerblue", ylim = c(0, 1), main = paste("Predicted probs of test image ", ind, sep = ""))

# Print result
cat("Actual label: ", label_y_test[ind], ", Predicted label:", label_y_hat_test[ind], sep = "")
Actual label: plane, Predicted label:ship

When I ran the code, unfortunately, my model predicted incorrectly and the uncertainty still existed.

Problem 4. Gaussian process prior

💪 Problem 4.1

Assume \(\sigma_f=1.5\) and \(\ell=0.5\). Use the function above to compute:

  1. The covariance between 0.3 and 0.7.

  2. The covariance between 0.1 and 0.5.

  3. The correlation between -0.2 and -0.5.

Explain why the covariances in 1. and 2. are the same.

# Assign value
sigma_f = 1.5
ell = 0.5

# Compute the squared exponential kernel 
pairwise_cov_squared_exp <- function(x, x_prime, sigma_f, ell) {
  return(sigma_f^2*exp(-1/(2*ell^2)*(x - x_prime)^2))
}

# Compute covariance and correlation
cov1 <- pairwise_cov_squared_exp(0.3, 0.7, sigma_f, ell)
cov2 <- pairwise_cov_squared_exp(0.1, 0.5, sigma_f, ell)
cor3 <- pairwise_cov_squared_exp(-0.2, -0.5, sigma_f, ell) / sigma_f^2

# Print covariance and correlation
cat(paste0("The covariance between 0.3 and 0.7.   : ", cov1, "\n",
           "The covariance between 0.1 and 0.5.   : ", cov2, "\n",
           "The correlation between -0.2 and -0.5 : ", cor3, "\n"
           ))
The covariance between 0.3 and 0.7.   : 1.6338353334158
The covariance between 0.1 and 0.5.   : 1.6338353334158
The correlation between -0.2 and -0.5 : 0.835270211411272

Theoretically, this squared exponential covariance kernel indicates the covariance between two points basing on their distance. Here, we can see that the absolute differences between two pairs (0.3, 0.7) and (0.1, 0.5), which also express their distances, are all equal to 0.4. This means that their kernel values should be also equal

💪 Problem 4.2

Compute the kernel matrix (on the input values specified below) with the help of the pairwise_cov_squared_exp() function. Interpret the value of row 2 and column 5 in the kernel matrix. Use the following input values when computing the kernel matrix:

X <- seq(-1, 1, length.out = 21)
# Input value
X <- seq(-1, 1, length.out = 21)

# Construct kernel matrix function
kernal_matrix_loop <- function(X, sigma_f, ell) {
  # Declare kernel matrix 
  kernel_matrix <- matrix(0, nrow=length(X), ncol=length(X))
  
  for (i in 1:nrow(kernel_matrix)) {
    for (j in 1:ncol(kernel_matrix)) {
      kernel_matrix[i, j] <- pairwise_cov_squared_exp(X[i], X[j], sigma_f, ell)
    }
  }
  return(kernel_matrix)
}

# Compute kernel matrix
kernel_matrix <- kernal_matrix_loop(X, 1.5, 0.5)

# Print the value of row 2 and column 5 in the kernel matrix
cat(paste0("The value of row 2 and column 5 in the kernel matrix : ", kernel_matrix[2, 5], "\n",
           "X2: ", X[2], "\n",
           "X5: ", X[5], "\n"
           ))
The value of row 2 and column 5 in the kernel matrix : 1.87935797567536
X2: -0.9
X5: -0.6

If we consider the list of 21 X inputs as above including \((X_1, X_2, ..., X_{21})\), the value of row 2 and column 5 in the kernel matrix, which is 1.879, specifies how correlated the function values at the value \(X_2 = -0.9\) and \(X_5=-0.6\) are. The higher value might indicate higher correlation among the function values and also shorter distance between \(X_2\) and \(X_5\).

💪 Problem 4.3

Compare the computing times between the two ways of constructing the kernel matrix (i.e. the double for-loop vs the vectorised version). Use the input vector X<-seq(-1,1,length.out=500) when comparing the computing times.

kernel_matrix_squared_exp <- function(X, Xstar, sigma_f, ell) {
  # Computes the kernel matrix for the squared exponential kernel model
  # Compute the pairwise squared Euclidean distances
  pairwise_squared_distances <- outer(X, Xstar, FUN = "-")^2
  # Compute the kernel matrix element-wise
  kernel_matrix <- sigma_f^2*exp(-1/(2*ell^2)*pairwise_squared_distances)
  return(kernel_matrix)
}

# Input value
X <- seq(-1,1,length.out=500)

# Measure computation time
suppressMessages(library(rbenchmark))

# Set seed and benchmark
set.seed(1234)
benchmark(kernal_matrix_loop(X, 1.5, 0.5), kernel_matrix_squared_exp(X, X, 1.5, 0.5))
                                       test replications elapsed relative
1           kernal_matrix_loop(X, 1.5, 0.5)          100  28.343    42.24
2 kernel_matrix_squared_exp(X, X, 1.5, 0.5)          100   0.671     1.00
  user.self sys.self user.child sys.child
1    28.174    0.101          0         0
2     0.474    0.195          0         0

With much smaller elapsed numbers (column containing values reported by system.time) around 1 compared to around 30), the vectorised version outperform in terms of computing time.

💪 Problem 4.4

Play around with the length scale \(\ell\) in the code above. Discuss the role of the length scale and its implication for the bias-variance trade off.

suppressMessages(library(mvtnorm)) # for multivariate normal
n_grid <- 200
X_grid <- seq(-1, 1, length.out = n_grid)
sigma_f <- 1

m_X <- rep(0, n_grid) # Create zero vector

# Assign ell values
ell_list <- c(0.1, 0.3, 1)

set.seed(1234)

# Plot figure for each ell
for (ell in ell_list) {
  K_X_X <- kernel_matrix_squared_exp(X_grid, X_grid, sigma_f, ell)
  GP_realisations <- rmvnorm(n = 5, mean = m_X, sigma = K_X_X)
  
  # Plot the GP
  matplot(X_grid, t(GP_realisations), type = "l", lty = 1, col = c("cornflowerblue", "lightcoral", "green", "black", "purple"), xlab = "x", ylab = "f(x)", main = paste("Simulations from the GP prior with ell=", ell, sep=""), xlim=c(-1, 1.5), ylim=c(-3*sigma_f, 3*sigma_f))
  legend("topright", legend = c("Sim 1", "Sim 2", "Sim 3", "Sim 4", "Sim 5"), col = c("cornflowerblue", "lightcoral", "green", "black", "purple"), lty = 1)
}

The 3 figures indicate that with a higher length scale \(\ell\), the lines are smoother. Smaller \(\ell\) means that function values can change quickly and vice versa. This is because with the same distance between variables, a higher \(\ell\) indeed achieves higher covariance which means the value of function can vary more significantly. If the function values are not correlated enough (low \(\ell\)), they might be too spike which leads to an increase of variance in the model. Meanwhile, if the values are too correlated (high \(\ell\)), the model might not be able to capture the full variability of the function, resulting in an increase in bias. So this is bias-variance trade-off when we choose \(\ell\).

Problem 5. Gaussian process posterior

💪 Problem 5.1

Derive (analytically) \(\mathbb{E}\left(\mathbf{y}\right)\) and \(\mathrm{Cov}\left(\mathbf{y}\right)\).

Tip

The tower property of expectations is

\[ \mathbb{E}\left(\mathbf{y}\right)=\mathbb{E}_\mathbf{f}\left(\mathbb{E}\left(\mathbf{y}|\mathbf{f}\right)\right). \]

The law of total covariance \[\mathrm{Cov}\left(\mathbf{y}\right)= \mathbb{E}_\mathbf{f}\left(\mathrm{Cov}\left(\mathbf{y}|\mathbf{f}\right)\right)+\mathrm{Cov}_\mathbf{f}\left(\mathbb{E}\left(\mathbf{y}|\mathbf{f}\right)\right).\]

The expectation and covariance of the inner expressions are with respect to the distribution of \(\mathbf{y}|\mathbf{f}\), i.e. treating \(\mathbf{f}\) as known.

Since \(f(x)\) follows a Gaussian process prior, i.e. \(f(x)\sim\mathcal{GP}\left(m(x), k(x,x^\prime)\right)\), and \(\varepsilon\sim N(0,\sigma_{\varepsilon}^2)\), we can derive:

\[ \begin{align*} \mathbb{E}\left(\mathbf{y}\right) &=\mathbb{E}_\mathbf{f}\left(\mathbb{E}\left(\mathbf{y}|\mathbf{f}\right)\right) \\[5pt]&=\mathbb{E}_\mathbf{f}\left(\mathbb{E}\left(\mathbf{f} + \varepsilon|\mathbf{f}\right)\right) \\[5pt]&=\mathbb{E}_\mathbf{f}\left(\mathbb{E}(\mathbf{f}|\mathbf{f}) + \mathbb{E}(\varepsilon|\mathbf{f})\right) \\[5pt]&=\mathbb{E}_\mathbf{f}\left(\mathbf{f}\right) (since\ \mathbb{E}(\mathbf{f}|\mathbf{f})=\mathbf{f}\ and\ \mathbb{E}(\varepsilon|\mathbf{f} )=0) \\[5pt]&=\mathbf{m}(\mathbf{X}) . \end{align*} \] \[ \begin{align*} \mathrm{Cov}\left(\mathbf{y}\right) &=\mathbb{E}_\mathbf{f}\left(\mathrm{Cov}\left(\mathbf{y}|\mathbf{f}\right)\right)+\mathrm{Cov}_\mathbf{f}\left(\mathbb{E}\left(\mathbf{y}|\mathbf{f}\right)\right) \\[5pt]&=\mathbb{E}_\mathbf{f}\left( \mathrm{Cov}(\mathbf{f}|\mathbf{f}) + \mathrm{Cov}(\mathbf{\varepsilon}|\mathbf{f}) \right) + \mathrm{Cov}_\mathbf{f}\left(\mathbf{f}\right) \\[5pt]&=\mathbb{E}_\mathbf{f}\left( \sigma^2_{\varepsilon}\mathit{I_n} \right) + \mathbf{K}(\mathbf{X},\mathbf{X^\prime}) (since\ \mathrm{Cov}(\mathbf{f}|\mathbf{f}) =0) \\[5pt]&=\sigma^2_{\varepsilon}\mathit{I_n} + \mathbf{K}(\mathbf{X},\mathbf{X^\prime}) . \end{align*} \]

💪 Problem 5.2

Predict the Gaussian process on a fine grid, x_grid<-seq(0,1,length.out=1000). In the same figure, plot a scatter of the data, the posterior mean of the Gaussian process, and \(95\%\) probability intervals for the Gaussian process. Explain why your interval does not seem to capture \(95\%\) of the data.

Tip

In the smoothing we did above, \(\mathbf{X}_*=\mathbf{X}\). This is not the case here, which has several implications when using the code above.

load(file = '/Users/thangtm589/Desktop/UTS/37401 Machine Learning/Computer Lab/Lab 3/penguins.RData')
y <- penguins$dive_heart_rate
n <- length(y)
X <- penguins$duration/max(penguins$duration) # Scale duration [0, 1]

plot(X, y, main="DHR vs scaled duration", col = "cornflowerblue", xlab = "Scaled duration", ylab = "Dive heart rate (DHR)")
sigma_f <- 100
ell <- 0.6
sigma_eps <- sqrt(150)

# Case 2: Use other Xstart
X_grid <- seq(0,1,length.out=1000)
# Compute means and kernel matrices
# Prior means
m_X <- rep(0, n)
m_Xgrid <- m_X
# Prior covariances
K_X_X <- kernel_matrix_squared_exp(X, X, sigma_f, ell)
K_X_Xgrid <- kernel_matrix_squared_exp(X, X_grid, sigma_f, ell)
K_Xgrid_X <- t(K_X_Xgrid)
K_Xgrid_Xgrid <- kernel_matrix_squared_exp(X_grid, X_grid, sigma_f, ell)
# Conditional distribution of f given y is normal. 
fbar_grid <- m_Xgrid + K_Xgrid_X%*%solve(K_X_X + sigma_eps^2*diag(n)) %*% (y - m_X)
cov_grid <- K_Xgrid_Xgrid - K_Xgrid_X %*% solve(K_X_X + sigma_eps^2*diag(n)) %*% K_X_Xgrid
lines(X_grid, fbar_grid, col = "purple", type = "l", lwd=3)

# Calculate sigma
sigma_cov <- sqrt(diag(cov_grid))

# Add 95% probability interval
# Define x values for shading (X_grid for this example)
x_shade <- X_grid
# Lower and upper interval (prior mean is zero)
lower_interval <- fbar_grid - 1.96*(sigma_cov)*rep(1, length(X_grid))
upper_interval <- fbar_grid + 1.96*(sigma_cov)*rep(1, length(X_grid))

# Create a polygon to shade the prediction interval (alpha controls transparency)
polygon(c(x_shade, rev(x_shade)), c(lower_interval, rev(upper_interval)), col = rgb(0, 0, 1, alpha = 0.05), border = NA)

# Set legend
legend(x = "topright", pch = c(1, 1), col = c("cornflowerblue", "purple"), legend=c("Data", "Grid (fitted) values"))

Actually, the 95% interval shade indeed represent the uncertainty of the mean estimate of the f function instead of the original dataset. With quite high \(\ell\) value (at 0.6), the exponential covariance can be high which leads to smoother lines of predicted values of function. Therefore, we can see that the width of the interval shade is relatively small.

💪 Problem 5.3

For simplicity, assume that the only unknown parameter is the length scale \(\ell\). Use the optim() function to maximise the log of the marginal likelihood to find the maximum likelihood estimate of \(\ell\). Treat \(\sigma_f\) and \(\sigma_\varepsilon\) as known (fixed at \(\sigma_f=100\) and \(\sigma_\varepsilon=\sqrt{150}\)).

Since \(\mathbf{y}|\boldsymbol{\theta} \sim \mathcal{N}\left(\mathbf{m}(\mathbf{X}), \mathbf{K}(\mathbf{X},\mathbf{X})+\sigma_{\varepsilon}^2\boldsymbol{I}_{n}\right)\) and let \(\lambda\) denote eigenvalue of \(\mathbf{K}(\mathbf{X},\mathbf{X})\), we have the equation of the log of the marginal likelihood as follows: \[ \begin{align*} \log p(\mathbf{y}|\boldsymbol{\theta}) &= \log \Bigg( (2\pi)^{-\frac{n}{2}} \det(\Sigma)^{-\frac{1}{2}} \exp\left( -\frac{1}{2}(\boldsymbol{y} - \mu)^\top \Sigma^{-1} (\boldsymbol{y} - \mu) \right) \Bigg)\\[5pt] &=\log\Bigg((2\pi)^{-\frac{n}{2}} \det(\mathbf{K}(\mathbf{X},\mathbf{X})+\sigma_{\varepsilon}^2\boldsymbol{I}_{n})^{-\frac{1}{2}} \exp\left(-{\frac{1}{2}} (\mathbf{y}-\mathbf{m}(\mathbf{X}))^T(\mathbf{K}(\mathbf{X},\mathbf{X})+\sigma_{\varepsilon}^2\boldsymbol{I}_{n})^{-1}(\mathbf{y}-\mathbf{m}(\mathbf{X}))\right) \Bigg) \\[5pt]&= \log\left((2\pi)^{-\frac{n}{2}}\right) -\frac{1}{2}\log\left(\det(\mathbf{K}(\mathbf{X},\mathbf{X})+\sigma_{\varepsilon}^2\boldsymbol{I}_{n})\right) - \frac{1}{2} (\mathbf{y}-\mathbf{m}(\mathbf{X}))^T(\mathbf{K}(\mathbf{X},\mathbf{X})+\sigma_{\varepsilon}^2\boldsymbol{I}_{n})^{-1}(\mathbf{y}-\mathbf{m}(\mathbf{X})) \\[5pt]&= {-\frac{n}{2}}\log(2\pi)\ -\frac{1}{2} \sum_{i=1}^n\log\left(\lambda_i+\sigma_{\varepsilon}^2 \right) - \frac{1}{2} (\mathbf{y}-\mathbf{m}(\mathbf{X}))^T(\mathbf{K}(\mathbf{X},\mathbf{X})+\sigma_{\varepsilon}^2\boldsymbol{I}_{n})^{-1}(\mathbf{y}-\mathbf{m}(\mathbf{X})) \end{align*} \]

First, I chose to minimize the negative marginal log likelihood instead of maximizing it simply because the minimization is more efficient. Therefore, I keep the default setting of optim() to implement minimizing. Moreover, I also assume that \(\mathbf{m}(\mathbf{X})=0\).

# Construct function to compute negative marginal log likelihood
neg_log_marginal_llh <- function(y, X, sigma_f, sigma_eps, ell) {
  n <- length(y)
  K_X_X <- kernel_matrix_squared_exp(X, X, sigma_f, ell)
  m_X <- rep(0, n)
  capital_sigma <- K_X_X + sigma_eps^2*diag(n)
  lambda <- eigen(K_X_X)$values
  log_p_y <- -(n/2)*log(2*pi) - (1/2) * sum(log(lambda + sigma_eps^2)) - (1/2) * (t(y-m_X) %*% solve(capital_sigma) %*% (y-m_X))
  return(-log_p_y)
}

# Initialize value
ell_start <- 0.6

# Convert e-00 to decimal form
options(scipen = 999)

# Use the optim() function with method="L-BFGS-B" to learn the parameters by minize negative marginal log likelihood function
optimal_ell <- optim(par = ell_start, fn = neg_log_marginal_llh, method = "L-BFGS-B", y = y, X = X, sigma_f = 100, sigma_eps = sqrt(150))$par

# Print result
cat("Optimal ell with the restriction is: ", optimal_ell)
Optimal ell with the restriction is:  0.597624

💪 Problem 5.4

Another approach (that does not use the marginal likelihood) to estimate \(\boldsymbol{\theta}\) is via cross-validation. Assume again that the only unknown parameter is \(\ell\). Use \(K=5\) fold cross-validation to estimate \(\ell\).

# Initialize values
sigma_f <- 100
sigma_eps <- sqrt(150)
ell_grid <- seq(0,1,length.out=50)
K = 5

# Design the k-fold data
ind <- c(1:length(y))
index <- split(ind, ceiling(seq_along(ind) / (length(y)/K)))

# Implement cross-validation
RMSE <- c()
for (ell in ell_grid) {
  RMSE_holdout <- 0
  for (i in c(1:K)) {
    test_row <- index[[i]]
    # Design traing and test data
    X_train <- X[-test_row]
    X_test <- X[test_row]
    y_train <- y[-test_row]
    y_test <- y[test_row]
    identity_matrix <- diag(length(X_train))
    
    # Calculate kernel mattrix
    K_X_X <- kernel_matrix_squared_exp(X_train, X_train, sigma_f, ell)
    K_X_Xtest <- kernel_matrix_squared_exp(X_train, X_test, sigma_f, ell)
    
    # Predict y hat value
    y_hat_test = t(K_X_Xtest) %*% solve((sigma_eps^2)*identity_matrix + K_X_X) %*% y_train
    RMSE_holdout <- RMSE_holdout + sqrt(sum((y_test - y_hat_test)^2)/length(y_test))
  }
  RMSE_k_fold <- RMSE_holdout/K
  RMSE <- rbind(RMSE, c(ell, RMSE_k_fold))
}

RMSE <- as.data.frame(RMSE)
colnames(RMSE) <- c('ell', 'rmse')

# Print the ell value with lowest cross-validated RMSE
cat('Ell value resulting in lowest cross-validated RMSE is: ', RMSE[which.min(RMSE$rmse), ]$ell )
Ell value resulting in lowest cross-validated RMSE is:  0.244898

💪 Problem 5.5

Assume now the realistic situation that the full \(\boldsymbol{\theta}\) is unknown, i.e. all parameters \(\sigma_f,\ell,\sigma_\varepsilon\). Estimate them by maximising the log of the marginal likelihood using the optim() function (no cross-validation!). Do your estimates coincides with the values I gave you, i.e. \(\sigma_f=100,\ell=0.6,\sigma_\varepsilon=\sqrt{150}\)?

# Construct function to compute marginal likelihood
neg_log_marginal_llh_1 <- function(y, X, theta) {
  sigma_f <- theta[1]
  sigma_eps <- theta[2]
  ell <- theta[3]
  n <- length(y)
  K_X_X <- kernel_matrix_squared_exp(X, X, sigma_f, ell)
  m_X <- rep(0, n)
  capital_sigma <- K_X_X + sigma_eps^2*diag(n)
  lambda <- eigen(K_X_X)$values
  log_p_y <- -(n/2)*log(2*pi) - (1/2) * sum(log(lambda + sigma_eps^2)) - (1/2) * (t(y-m_X) %*% solve(capital_sigma) %*% (y-m_X))
  return(-log_p_y)
}

# Initialize values
sigma_f <- 100
sigma_eps <- sqrt(150)
ell <- 0.6

# Use the optim() function with method="L-BFGS-B" to learn the parameters
optimal_params <- optim(par = c(sigma_f, sigma_eps, ell), fn = neg_log_marginal_llh_1, method = "L-BFGS-B", y = y, X = X)$par

# Convert e-00 to decimal form
options(scipen = 999)

# Print results
cat(paste0("Given parameters are: ", "\n",
           "sigma_f:   ", sigma_f, "\n",
           "sigma_eps: ", sigma_eps, "\n",
           "ell:       ", ell, "\n\n",
           
           "Optimal parameters are: ", "\n",
           "sigma_f:   ", optimal_params[1], "\n",
           "sigma_eps: ", optimal_params[2], "\n",
           "ell:       ", abs(optimal_params[3]), "\n"
           ))
Given parameters are: 
sigma_f:   100
sigma_eps: 12.2474487139159
ell:       0.6

Optimal parameters are: 
sigma_f:   100.002781033724
sigma_eps: 12.2246395117
ell:       0.5975695514989

The values of \(\sigma_f\), \(\sigma_\varepsilon\) and \(\ell\) are almost equal to the given ones.